Skip to content

Commit 1c8297b

Browse files
authored
Merge pull request github#13548 from geoffw0/redos
Swift: Query for REDOS (Regular Expression Denial Of Service)
2 parents 80a799d + 962c16d commit 1c8297b

File tree

8 files changed

+196
-0
lines changed

8 files changed

+196
-0
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: newQuery
3+
---
4+
* Added new query "Inefficient regular expression" (`swift/redos`). This query finds regular expressions that require exponential time to match certain inputs and may make an application vulnerable to denial-of-service attacks.
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
<include src="ReDoSIntroduction.inc.qhelp" />
4+
<example>
5+
<p>Consider the following regular expression:</p>
6+
<sample language="swift">
7+
/^_(__|.)+_$/</sample>
8+
<p>
9+
Its sub-expression <code>"(__|.)+"</code> can match the string
10+
<code>"__"</code> either by the first alternative <code>"__"</code> to the
11+
left of the <code>"|"</code> operator, or by two repetitions of the second
12+
alternative <code>"."</code> to the right. Therefore, a string consisting of an
13+
odd number of underscores followed by some other character will cause the
14+
regular expression engine to run for an exponential amount of time before
15+
rejecting the input.
16+
</p>
17+
<p>
18+
This problem can be avoided by rewriting the regular expression to remove
19+
the ambiguity between the two branches of the alternative inside the
20+
repetition:
21+
</p>
22+
<sample language="swift">
23+
/^_(__|[^_])+_$/</sample>
24+
</example>
25+
<include src="ReDoSReferences.inc.qhelp"/>
26+
</qhelp>
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/**
2+
* @name Inefficient regular expression
3+
* @description A regular expression that requires exponential time to match certain inputs
4+
* can be a performance bottleneck, and may be vulnerable to denial-of-service
5+
* attacks.
6+
* @kind problem
7+
* @problem.severity error
8+
* @security-severity 7.5
9+
* @precision high
10+
* @id swift/redos
11+
* @tags security
12+
* external/cwe/cwe-1333
13+
* external/cwe/cwe-730
14+
* external/cwe/cwe-400
15+
*/
16+
17+
import codeql.swift.regex.Regex
18+
private import codeql.swift.regex.RegexTreeView::RegexTreeView as TreeView
19+
import codeql.regex.nfa.ExponentialBackTracking::Make<TreeView>
20+
21+
from TreeView::RegExpTerm t, string pump, State s, string prefixMsg
22+
where hasReDoSResult(t, pump, s, prefixMsg)
23+
select t,
24+
"This part of the regular expression may cause exponential backtracking on strings " + prefixMsg +
25+
"containing many repetitions of '" + pump + "'."
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
<overview>
4+
<p>
5+
Some regular expressions take a long time to match certain input strings
6+
to the point where the time it takes to match a string of length <i>n</i>
7+
is proportional to <i>n<sup>k</sup></i> or even <i>2<sup>n</sup></i>.
8+
Such regular expressions can negatively affect performance, and potentially allow
9+
a malicious user to perform a Denial of Service ("DoS") attack by crafting
10+
an expensive input string for the regular expression to match.
11+
</p>
12+
<p>
13+
The regular expression engine used by Swift uses
14+
backtracking non-deterministic finite automata to implement regular
15+
expression matching. While this approach is space-efficient and allows
16+
supporting advanced features like capture groups, it is not time-efficient
17+
in general. The worst-case time complexity of such an automaton can be
18+
polynomial or exponential, meaning that for strings of a certain
19+
shape, increasing the input length by ten characters may make the
20+
automaton about 1000 times slower.
21+
</p>
22+
<p>
23+
Typically, a regular expression is affected by this problem if it contains
24+
a repetition of the form <code>r*</code> or <code>r+</code> where the
25+
sub-expression <code>r</code> is ambiguous in the sense that it can match
26+
some string in multiple ways. More information about the precise
27+
circumstances can be found in the references.
28+
</p>
29+
</overview>
30+
<recommendation>
31+
<p>
32+
Modify the regular expression to remove the ambiguity, or ensure that the
33+
strings matched with the regular expression are short enough that the
34+
time complexity does not matter.
35+
</p>
36+
</recommendation>
37+
</qhelp>
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
<references>
4+
<li> OWASP:
5+
<a href="https://www.owasp.org/index.php/Regular_expression_Denial_of_Service_-_ReDoS">Regular expression Denial of Service - ReDoS</a>.
6+
</li>
7+
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/ReDoS">ReDoS</a>.</li>
8+
<li>Wikipedia: <a href="https://en.wikipedia.org/wiki/Time_complexity">Time complexity</a>.</li>
9+
<li>James Kirrage, Asiri Rathnayake, Hayo Thielecke:
10+
<a href="https://arxiv.org/abs/1301.0849">Static Analysis for Regular Expression Denial-of-Service Attack</a>.
11+
</li>
12+
</references>
13+
</qhelp>
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
| ReDoS.swift:65:22:65:22 | a* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
2+
| ReDoS.swift:66:22:66:22 | a* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
3+
| ReDoS.swift:69:18:69:18 | a* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
4+
| ReDoS.swift:77:57:77:57 | a* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
5+
| ReDoS.swift:80:57:80:57 | a* | This part of the regular expression may cause exponential backtracking on strings containing many repetitions of 'a'. |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
queries/Security/CWE-1333/ReDoS.ql
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
2+
// --- stubs ---
3+
4+
struct URL {
5+
init?(string: String) {}
6+
}
7+
8+
struct AnyRegexOutput {
9+
}
10+
11+
protocol RegexComponent {
12+
}
13+
14+
struct Regex<Output> : RegexComponent {
15+
struct Match {
16+
}
17+
18+
init(_ pattern: String) throws where Output == AnyRegexOutput { }
19+
20+
func firstMatch(in string: String) throws -> Regex<Output>.Match? { return nil}
21+
22+
typealias RegexOutput = Output
23+
}
24+
25+
extension String {
26+
init(contentsOf: URL) {
27+
let data = ""
28+
self.init(data)
29+
}
30+
}
31+
32+
class NSObject {
33+
}
34+
35+
struct _NSRange {
36+
init(location: Int, length: Int) { }
37+
}
38+
39+
typealias NSRange = _NSRange
40+
41+
class NSRegularExpression : NSObject {
42+
struct Options : OptionSet {
43+
var rawValue: UInt
44+
}
45+
46+
struct MatchingOptions : OptionSet {
47+
var rawValue: UInt
48+
}
49+
50+
init(pattern: String, options: NSRegularExpression.Options = []) throws { }
51+
52+
func stringByReplacingMatches(in string: String, options: NSRegularExpression.MatchingOptions = [], range: NSRange, withTemplate templ: String) -> String { return "" }
53+
}
54+
55+
// --- tests ---
56+
57+
func myRegexpTests(myUrl: URL) throws {
58+
let tainted = String(contentsOf: myUrl) // tainted
59+
let untainted = "abcdef"
60+
61+
// Regex
62+
63+
_ = "((a*)*b)" // GOOD (never used)
64+
_ = try Regex("((a*)*b)") // DUBIOUS (never used)
65+
_ = try Regex("((a*)*b)").firstMatch(in: untainted) // DUBIOUS (never used on tainted input) [FLAGGED]
66+
_ = try Regex("((a*)*b)").firstMatch(in: tainted) // BAD
67+
_ = try Regex(".*").firstMatch(in: tainted) // GOOD (safe regex)
68+
69+
let str = "((a*)*b)" // BAD
70+
let regex = try Regex(str)
71+
_ = try regex.firstMatch(in: tainted)
72+
73+
// NSRegularExpression
74+
75+
_ = try? NSRegularExpression(pattern: "((a*)*b)") // DUBIOUS (never used)
76+
77+
let nsregex1 = try? NSRegularExpression(pattern: "((a*)*b)") // DUBIOUS (never used on tainted input) [FLAGGED]
78+
_ = nsregex1?.stringByReplacingMatches(in: untainted, range: NSRange(location: 0, length: untainted.utf16.count), withTemplate: "")
79+
80+
let nsregex2 = try? NSRegularExpression(pattern: "((a*)*b)") // BAD
81+
_ = nsregex2?.stringByReplacingMatches(in: tainted, range: NSRange(location: 0, length: tainted.utf16.count), withTemplate: "")
82+
83+
let nsregex3 = try? NSRegularExpression(pattern: ".*") // GOOD (safe regex)
84+
_ = nsregex3?.stringByReplacingMatches(in: tainted, range: NSRange(location: 0, length: tainted.utf16.count), withTemplate: "")
85+
}

0 commit comments

Comments
 (0)