static analysis

Main concepts

Static Analysis is the automated analysis of the source code that happens without executing the code.

When the analysis is performed during program execution then it is known as dynamic analysis.

Static Analysis is often used to detect:

Syntax errors
Conventional rules violations
Bug and issues
Security vulnerabilities
Performance issues
Non-compliance with standards
Use of deprecated programming constructs

Types of static analysis

From (Kluban, Mannan, et al., 2024):

Textual similarity approaches The main principle of textual similarity methods is finding matches between examined code instances, and a known vulnerable code dataset. The instances are usually represented in a generalized version, e.g., bag-of-tokens, AST (Abstract Syntax Tree), and then compared:

directly
using transformations such as cryptographic hashing or vectorization (e.g., with Word2Vec)
using other abstract representations, such as CFG (Control Flow Graph), or PDG (Program Dependency Graph)
context-sensitive hashing may be used to allow limited alterations in the examined code

Semantic similarity approaches These methods detect semantic (functional) similarities by searching for vulnerability patterns and can be divided into two categories based on how patterns are developed:

manual: developed by researcher for ad-hoc situations or specific attacks (XSS (cross site scripting), SQLIA (SQL injection attack), DoS (Denial of Service), etc)
automated: usually the result of machine learning algorithms. Certain features, which are supposed to make a function vulnerable, are extracted by analyzing a large set of the representations of known vulnerable functions. A large trusted dataset is necessary for this kind of approaches

Methods

Source code parsing technology to create an AST (Abstract Syntax Tree)

AST matching treats the source code as program code, and not just files filled with text, this allows for more specific, contextual matching and can reduce the number of false positives reported against the code.

Text RegEx (Regular Expression) matching

Very flexible method, easy to write rules to match, but can often lead to a lot of false positives and the matching rules are ignorant of the surrounding code context.

Advantages

Fast error detection, especially syntax ones (see static invariants)
Automation of the process
More efficient than dynamic analysis and therefore more suitable for large scale analysis

Limits

False positives: errors or issues that are not really errors can be mistakenly detected because the program logic is not actually followed
Blindness: runtime errors cannot be detected

Tools

SonarQube: a static analysis tool to detect bugs, vulnerabilities and code smells
ESlint: a JavaScript linting tool that aims at identifying and correcting syntax issues
Coverity: a defect detector available for several programming languages

References

(Kluban, Mannan, et al., 2024)

GUI testing wiki

Explorer