Skip to content

Commit 1b73cee

Browse files
committed
JS: add js/exploitable-polynomial-redos
1 parent 091c6c0 commit 1b73cee

15 files changed

+775
-63
lines changed

change-notes/1.24/analysis-javascript.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
| Cross-site scripting through exception (`js/xss-through-exception`) | security, external/cwe/cwe-079, external/cwe/cwe-116 | Highlights potential XSS vulnerabilities where an exception is written to the DOM. Results are not shown on LGTM by default. |
3838
| Regular expression always matches (`js/regex/always-matches`) | correctness, regular-expressions | Highlights regular expression checks that trivially succeed by matching an empty substring. Results are shown on LGTM by default. |
3939
| Missing await (`js/missing-await`) | correctness | Highlights expressions that operate directly on a promise object in a nonsensical way, instead of awaiting its result. Results are shown on LGTM by default. |
40+
| Polynomial regular expression used on uncontrolled data (`js/polynomial-redos`) | security, external/cwe/cwe-730, external/cwe/cwe-400 | Highlights expensive regular expressions that may be used on malicious input. Results are shown on LGTM by default. |
4041
| Prototype pollution in utility function (`js/prototype-pollution-utility`) | security, external/cwe/cwe-400, external/cwe/cwe-471 | Highlights recursive copying operations that are susceptible to prototype pollution. Results are shown on LGTM by default. |
4142
| Unsafe jQuery plugin (`js/unsafe-jquery-plugin`) | Highlights potential XSS vulnerabilities in unsafely designed jQuery plugins. Results are shown on LGTM by default. |
4243

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
5+
<qhelp>
6+
7+
<include src="ReDoSIntroduction.qhelp" />
8+
9+
<example>
10+
<p>
11+
12+
Consider this use of a regular expression, which removes
13+
all leading and trailing whitespace in a string:
14+
15+
</p>
16+
17+
<sample language="javascript">
18+
text.replace(/^\s+|\s+$/g, ''); // BAD
19+
</sample>
20+
21+
<p>
22+
23+
The sub-expression <code>"\s+$"</code> will match the
24+
whitespace characters in <code>text</code> from left to right, but it
25+
can start matching anywhere within a whitespace sequence. This is
26+
problematic for strings that do <strong>not</strong> end with a whitespace
27+
character. Such a string will force the regular expression engine to
28+
process each whitespace sequence once per whitespace character in the
29+
sequence.
30+
31+
</p>
32+
33+
<p>
34+
35+
This ultimately means that the time cost of trimming a
36+
string is quadratic in the length of the string. So a string like
37+
<code>"a b"</code> will take milliseconds to process, but a similar
38+
string with a million spaces instead of just one will take several
39+
minutes.
40+
41+
</p>
42+
43+
<p>
44+
45+
Avoid this problem by rewriting the regular expression to
46+
not contain the ambiguity about when to start matching whitespace
47+
sequences. For instance, by using a negative look-behind
48+
(<code>/^\s+|(?&lt;!\s)\s+$/g</code>), or just by using the built-in trim
49+
method (<code>text.trim()</code>).
50+
51+
</p>
52+
53+
<p>
54+
55+
Note that the sub-expression <code>"^\s+"</code> is
56+
<strong>not</strong> problematic as the <code>^</code> anchor restricts
57+
when that sub-expression can start matching, and as the regular
58+
expression engine matches from left to right.
59+
60+
</p>
61+
62+
</example>
63+
64+
<example>
65+
66+
<p>
67+
68+
As a similar, but slightly subtler problem, consider the
69+
regular expression that matches lines with numbers, possibly written
70+
using scientific notation:
71+
</p>
72+
73+
<sample language="javascript">
74+
^0\.\d+E?\d+$ // BAD
75+
</sample>
76+
77+
<p>
78+
79+
The problem with this regular expression is in the
80+
sub-expression <code>\d+E?\d+</code> because the second
81+
<code>\d+</code> can start matching digits anywhere after the first
82+
match of the first <code>\d+</code> if there is no <code>E</code> in
83+
the input string.
84+
85+
</p>
86+
87+
<p>
88+
89+
This is problematic for strings that do <strong>not</strong>
90+
end with a digit. Such a string will force the regular expression
91+
engine to process each digit sequence once per digit in the sequence,
92+
again leading to a quadratic time complexity.
93+
94+
</p>
95+
96+
<p>
97+
98+
To make the processing faster, the regular expression
99+
should be rewritten such that the two <code>\d+</code> sub-expressions
100+
do not have overlapping matches: <code>^0\.\d+(E\d+)?$</code>.
101+
102+
</p>
103+
104+
</example>
105+
106+
<include src="ReDoSReferences.qhelp"/>
107+
108+
</qhelp>
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
/**
2+
* @name Polynomial regular expression used on uncontrolled data
3+
* @description A regular expression that can require polynomial time
4+
* to match user-provided values may be
5+
* vulnerable to denial-of-service attacks.
6+
* @kind path-problem
7+
* @problem.severity warning
8+
* @precision high
9+
* @id js/polynomial-redos
10+
* @tags security
11+
* external/cwe/cwe-730
12+
* external/cwe/cwe-400
13+
*/
14+
15+
import javascript
16+
import semmle.javascript.security.performance.PolynomialReDoS::PolynomialReDoS
17+
import DataFlow::PathGraph
18+
19+
20+
from Configuration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
21+
where cfg.hasFlowPath(source, sink)
22+
select sink.getNode(), source, sink, "This expensive $@ use depends on $@.",
23+
sink.getNode().(Sink).getRegExp(), "regular expression", source.getNode(), "a user-provided value"
Lines changed: 27 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,34 @@
11
<!DOCTYPE qhelp PUBLIC
2-
"-//Semmle//qhelp//EN"
3-
"qhelp.dtd">
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
45
<qhelp>
56

6-
<overview>
7-
<p>
8-
Some regular expressions take a very long time to match certain input strings to the point where
9-
the time it takes to match a string of length <i>n</i> is proportional to <i>2<sup>n</sup></i>.
10-
Such regular expressions can negatively affect performance, or even allow a malicious user to
11-
perform a Denial of Service ("DoS") attack by crafting an expensive input string for the regular
12-
expression to match.
13-
</p>
14-
<p>
15-
The regular expression engines provided by many popular JavaScript platforms use backtracking
16-
non-deterministic finite automata to implement regular expression matching. While this approach
17-
is space-efficient and allows supporting advanced features like capture groups, it is not
18-
time-efficient in general. The worst-case time complexity of such an automaton can be exponential,
19-
meaning that for strings of a certain shape, increasing the input length by ten characters may
20-
make the automaton about 1000 times slower.
21-
</p>
22-
<p>
23-
Typically, a regular expression is affected by this problem if it contains a repetition of the
24-
form <code>r*</code> or <code>r+</code> where the sub-expression <code>r</code> is ambiguous in
25-
the sense that it can match some string in multiple ways. More information about the precise
26-
circumstances can be found in the references.
27-
</p>
28-
</overview>
7+
<include src="ReDoSIntroduction.qhelp" />
298

30-
<recommendation>
31-
<p>
32-
Modify the regular expression to remove the ambiguity.
33-
</p>
34-
</recommendation>
9+
<example>
10+
<p>
11+
Consider this regular expression:
12+
</p>
13+
<sample language="javascript">
14+
/^_(__|.)+_$/
15+
</sample>
16+
<p>
17+
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
18+
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
19+
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
20+
of an odd number of underscores followed by some other character will cause the regular
21+
expression engine to run for an exponential amount of time before rejecting the input.
22+
</p>
23+
<p>
24+
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
25+
the two branches of the alternative inside the repetition:
26+
</p>
27+
<sample language="javascript">
28+
/^_(__|[^_])+_$/
29+
</sample>
30+
</example>
3531

36-
<example>
37-
<p>
38-
Consider this regular expression:
39-
</p>
40-
<sample language="javascript">
41-
/^_(__|.)+_$/
42-
</sample>
43-
<p>
44-
Its sub-expression <code>"(__|.)+?"</code> can match the string <code>"__"</code> either by the
45-
first alternative <code>"__"</code> to the left of the <code>"|"</code> operator, or by two
46-
repetitions of the second alternative <code>"."</code> to the right. Thus, a string consisting
47-
of an odd number of underscores followed by some other character will cause the regular
48-
expression engine to run for an exponential amount of time before rejecting the input.
49-
</p>
50-
<p>
51-
This problem can be avoided by rewriting the regular expression to remove the ambiguity between
52-
the two branches of the alternative inside the repetition:
53-
</p>
54-
<sample language="javascript">
55-
/^_(__|[^_])+_$/
56-
</sample>
57-
</example>
32+
<include src="ReDoSReferences.qhelp"/>
5833

59-
<references>
60-
<li>
61-
OWASP:
62-
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
63-
</li>
64-
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
65-
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
66-
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
67-
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
68-
</li>
69-
</references>
7034
</qhelp>
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
<overview>
6+
<p>
7+
8+
Some regular expressions take a long time to match certain
9+
input strings to the point where the time it takes to match a string
10+
of length <i>n</i> is proportional to <i>n<sup>k</sup></i> or even
11+
<i>2<sup>n</sup></i>. Such regular expressions can negatively affect
12+
performance, or even allow a malicious user to perform a Denial of
13+
Service ("DoS") attack by crafting an expensive input string for the
14+
regular expression to match.
15+
16+
</p>
17+
18+
<p>
19+
20+
The regular expression engines provided by many popular
21+
JavaScript platforms use backtracking non-deterministic finite
22+
automata to implement regular expression matching. While this approach
23+
is space-efficient and allows supporting advanced features like
24+
capture groups, it is not time-efficient in general. The worst-case
25+
time complexity of such an automaton can be polynomial or even
26+
exponential, meaning that for strings of a certain shape, increasing
27+
the input length by ten characters may make the automaton about 1000
28+
times slower.
29+
30+
</p>
31+
32+
<p>
33+
34+
Typically, a regular expression is affected by this
35+
problem if it contains a repetition of the form <code>r*</code> or
36+
<code>r+</code> where the sub-expression <code>r</code> is ambiguous
37+
in the sense that it can match some string in multiple ways. More
38+
information about the precise circumstances can be found in the
39+
references.
40+
41+
</p>
42+
</overview>
43+
44+
<recommendation>
45+
46+
<p>
47+
48+
Modify the regular expression to remove the ambiguity, or
49+
ensure that the strings matched with the regular expression are short
50+
enough that the time-complexity does not matter.
51+
52+
</p>
53+
54+
</recommendation>
55+
</qhelp>
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
<references>
6+
<li>
7+
OWASP:
8+
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
9+
</li>
10+
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
11+
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
12+
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
13+
<a href="http://www.cs.bham.ac.uk/~hxt/research/reg-exp-sec.pdf">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
14+
</li>
15+
</references>
16+
</qhelp>
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
/**
2+
* Provides a taint tracking configuration for reasoning about
3+
* polynomial regular expression denial-of-service attacks.
4+
*
5+
* Note, for performance reasons: only import this file if
6+
* `PolynomialReDoS::Configuration` is needed, otherwise
7+
* `PolynomialReDoSCustomizations` should be imported instead.
8+
*/
9+
import javascript
10+
11+
module PolynomialReDoS {
12+
import PolynomialReDoSCustomizations::PolynomialReDoS
13+
14+
class Configuration extends TaintTracking::Configuration {
15+
Configuration() { this = "PolynomialReDoS" }
16+
17+
override predicate isSource(DataFlow::Node source) { source instanceof Source }
18+
19+
override predicate isSink(DataFlow::Node sink) { sink instanceof Sink }
20+
21+
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode node) {
22+
super.isSanitizerGuard(node) or
23+
node instanceof LengthGuard
24+
}
25+
26+
override predicate isSanitizer(DataFlow::Node node) {
27+
super.isSanitizer(node) or
28+
node instanceof Sanitizer
29+
}
30+
}
31+
}

0 commit comments

Comments
 (0)