On Detecting and Measuring Exploitable JavaScript Functions in Real-world Applications

The paper assesses the presence of vulnerable JS functions in active JavaScript projects in the real world. 9+ Millions of JS functions were tested against a dataset of vulnerable JS function build from VulnCode-DB and Snyk. The authors focus on prototype pollution and ReDoS (Regular Expression Denial of Service), as vulnerable implementations patterns may be detected automatically for these issues. The core idea is that a real project that include a vulnerable library it is not necessarily vulnerable if the vulnerable function is (i) actually used in the project (ii) accessible from the user. Pattern and textual-similarity-based approaches and multi-files static taint analysis were performed to assess the prevalence of prototype pollution and ReDoS.

Context

The study of vulnerabilities in JS dependencies has led to the flagging a number of projects as vulnerable, due to the presence of vulnerable dependencies. In reality, many such projects (73%) are not really vulnerable because they do not actually use the vulnerable functions.

There are insufficient reliable dataset of vulnerable code and therefore it is complicated to train models or to test other mitigation solution properly.

The objective of our study is to assess the presence of vulnerable JS functions in active JavaScript projects in the real world. We gather JS from 3 sources

NPM packages
Chrome web extensions
top popular websites

And we test +9M of JS functions against a dataset of vulnerable JS functions we build

Approach

Creation of dataset of vulnerable functions

we automatically collect vulnerable JavaScript functions from Snyk and VulnCodeDB (vulnerability databases), to compose an updated dataset. Only entries that present a link to the source code were taken into consideration
Test files, empty functions or cases were both vulnerable and fixed functions were identical were ruled out
Almost 5000 functions were found (895 entries), but only ReDoS (Regular Expression Denial of Service) (121 entries), and prototype pollution (101 entries) were considered
150 entries were manually verified (using a web application developed ad hoc for that to simplify the process) and studied to identify patterns

Identification and formalization of vulnerability patterns in ReDoS and PP

new rules to detect ReDoS and prototype pollution were created with Semgrep (iterative process: the rule was guessed, the Semgrep script was run on some entries, some functions were flagged, then rule was improved to flag more functions, …)
all rules are available at github.com/Marynk/JavaScript-vulnerability-detection/tree/main/semgrep (e.g., object[key] = value for prototype pollution)

Finding new vulnerabilities in the wild

We gather a large dataset of 9,205,654 JavaScript functions from active real-world projects from three different application types (NPM packages, Chrome extensions, and top websites). This collection process is also fully automated.
A combination of pattern and textual-similarity-based approaches and of a STA (static taint analysis) with a novel representation of file dependency graphs were used to identify matches between real-world functions and the dataset of vulnerable functions that was created before. Real-world functions flagged are exploitable from a malicious user input
For details on matching techniques see section 4.2: content-sensitive hash comparison was used to transform the functions to compare into fixed-side strings. A similarity threshold can be set to evaluate the match
We detect 124,934 vulnerable functions from this real world dataset. The estimated average precision is 94.5%, based on manual verification of a small subset
With our taint analysis, we identify 301 cases from 134 NPM packages (5.7% of all findings in NPM packages), which are exploitable in the project context. Manual verification of 100 cases detected no false positives produced by the taint analysis mechanism

Semi-automated reporting approach

to deal with a large number of disclosure notices, we develop a semi-automated technique to report our findings. We first search for duplicates of our findings in the CVE database (and identify 19 cases) and then automatically compose readable vulnerability reports for the remaining 290 findings and send them to 112 responsible project developers
25 new public CVEs and 169 reserved CVEs were obtained

All the framework code and datasets is available at: https://github.com/Marynk/JavaScriptvulnerability-detection.

References

(Kluban, Mannan, et al., 2024)

GUI testing wiki

Explorer

On Detecting and Measuring Exploitable JavaScript Functions in Real-world Applications

Context

Approach

References

Graph View

Table of Contents

Backlinks