Skip to content

Commit e3c3c00

Browse files
committed
Ruby: Add MissingRegExpAnchor query
1 parent debc57b commit e3c3c00

File tree

7 files changed

+229
-0
lines changed

7 files changed

+229
-0
lines changed
Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
9+
Sanitizing untrusted input with regular expressions is a
10+
common technique. However, it is error-prone to match untrusted input
11+
against regular expressions without anchors such as <code>^</code> or
12+
<code>$</code>. Malicious input can bypass such security checks by
13+
embedding one of the allowed patterns in an unexpected location.
14+
15+
</p>
16+
17+
<p>
18+
19+
Even if the matching is not done in a security-critical
20+
context, it may still cause undesirable behavior when the regular
21+
expression accidentally matches.
22+
23+
</p>
24+
</overview>
25+
26+
<recommendation>
27+
<p>
28+
29+
Use anchors to ensure that regular expressions match at
30+
the expected locations.
31+
32+
</p>
33+
</recommendation>
34+
35+
<example>
36+
37+
<p>
38+
39+
The following example code checks that a URL redirection
40+
will reach the <code>example.com</code> domain, or one of its
41+
subdomains, and not some malicious site.
42+
43+
</p>
44+
45+
<sample src="examples/missing_regexp_anchor_bad.rb"/>
46+
47+
<p>
48+
49+
The check with the regular expression match is, however, easy to bypass. For example
50+
by embedding <code>http://example.com/</code> in the query
51+
string component: <code>http://evil-example.net/?x=http://example.com/</code>.
52+
53+
Address these shortcomings by using anchors in the regular expression instead:
54+
55+
</p>
56+
57+
<sample src="examples/missing_regexp_anchor_good.rb"/>
58+
59+
<p>
60+
61+
A related mistake is to write a regular expression with
62+
multiple alternatives, but to only include an anchor for one of the
63+
alternatives. As an example, the regular expression
64+
<code>/^www\.example\.com|beta\.example\.com/</code> will match the host
65+
<code>evil.beta.example.com</code> because the regular expression is parsed
66+
as <code>/(^www\.example\.com)|(beta\.example\.com)/</code>
67+
68+
TODO: implement this part of the query
69+
70+
</p>
71+
72+
<p>
73+
74+
TODO: describe the danger of using line anchors like <code>^</code>
75+
or <code>$</code>.
76+
77+
</p>
78+
79+
</example>
80+
81+
<references>
82+
<li>OWASP: <a href="https://www.owasp.org/index.php/Server_Side_Request_Forgery">SSRF</a></li>
83+
<li>OWASP: <a href="https://cheatsheetseries.owasp.org/cheatsheets/Unvalidated_Redirects_and_Forwards_Cheat_Sheet.html">XSS Unvalidated Redirects and Forwards Cheat Sheet</a>.</li>
84+
</references>
85+
</qhelp>
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
/**
2+
* @name Missing regular expression anchor
3+
* @description Regular expressions without anchors can be vulnerable to bypassing.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @security-severity 7.8
7+
* @precision medium
8+
* @id rb/regex/missing-regexp-anchor
9+
* @tags correctness
10+
* security
11+
* external/cwe/cwe-020
12+
*/
13+
14+
import HostnameRegexpShared
15+
import codeql.ruby.DataFlow
16+
import codeql.ruby.security.performance.RegExpTreeView
17+
18+
/**
19+
* Holds if `term` is a final term, that is, no term will match anything after this one.
20+
*/
21+
predicate isFinalRegExpTerm(RegExpTerm term) {
22+
term.isRootTerm()
23+
or
24+
exists(RegExpSequence seq |
25+
isFinalRegExpTerm(seq) and
26+
term = seq.getLastChild()
27+
)
28+
or
29+
exists(RegExpTerm parent |
30+
isFinalRegExpTerm(parent) and
31+
term = parent.getAChild() and
32+
not parent instanceof RegExpSequence and
33+
not parent instanceof RegExpQuantifier
34+
)
35+
}
36+
37+
/**
38+
* Holds if `src` contains a hostname pattern that uses the `^/$` line anchors
39+
* rather than `\A/\z` which match the start/end of the whole string.
40+
*/
41+
predicate isLineAnchoredHostnameRegExp(RegExpPatternSource src, string msg) {
42+
not isSemiAnchoredHostnameRegExp(src, msg) and // avoid double reporting
43+
exists(RegExpTerm term, RegExpSequence tld, int i | term = src.getRegExpTerm() |
44+
not isConstantInvalidInsideOrigin(term.getAChild*()) and
45+
tld = term.getAChild*() and
46+
hasTopLevelDomainEnding(tld, i) and
47+
isFinalRegExpTerm(tld.getChild(i)) and // nothing is matched after the TLD
48+
(
49+
tld.getChild(0).(RegExpCaret).getChar() = "^" or
50+
tld.getLastChild().(RegExpDollar).getChar() = "$"
51+
) and
52+
msg =
53+
"This hostname pattern uses anchors such as '^' and '$', which match the start and end of a line, not the whole string. Use '\\A' and '\\z' instead."
54+
)
55+
}
56+
57+
/**
58+
* Holds if `src` contains a hostname pattern that is missing a `$` anchor.
59+
*/
60+
predicate isSemiAnchoredHostnameRegExp(RegExpPatternSource src, string msg) {
61+
// not hasMisleadingAnchorPrecedence(src, _) and // avoid double reporting
62+
exists(RegExpTerm term, RegExpSequence tld, int i | term = src.getRegExpTerm() |
63+
not isConstantInvalidInsideOrigin(term.getAChild*()) and
64+
tld = term.getAChild*() and
65+
hasTopLevelDomainEnding(tld, i) and
66+
isFinalRegExpTerm(tld.getChild(i)) and // nothing is matched after the TLD
67+
tld.getChild(0) instanceof RegExpCaret and
68+
msg =
69+
"This hostname pattern may match any domain name, as it is missing a '\\z' or '/' at the end."
70+
)
71+
}
72+
73+
/**
74+
* Holds if `src` is an unanchored pattern for a URL, indicating a
75+
* mistake explained by `msg`.
76+
*/
77+
predicate isUnanchoredHostnameRegExp(RegExpPatternSource src, string msg) {
78+
exists(RegExpTerm term, RegExpSequence tld | term = src.getRegExpTerm() |
79+
alwaysMatchesHostname(term) and
80+
tld = term.getAChild*() and
81+
hasTopLevelDomainEnding(tld) and
82+
not isConstantInvalidInsideOrigin(term.getAChild*()) and
83+
not term.getAChild*() instanceof RegExpAnchor and
84+
// that is not used for capture or replace
85+
not exists(DataFlow::CallNode mcn, DataFlow::Node arg, string name |
86+
name = mcn.getMethodName() and
87+
arg = mcn.getArgument(0)
88+
|
89+
(
90+
src.getAParse().(DataFlow::LocalSourceNode).flowsTo(arg) or
91+
src.getAParse() = arg
92+
) and
93+
name = ["sub", "sub!", "gsub", "gsub!"]
94+
) and
95+
msg =
96+
"When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it."
97+
)
98+
}
99+
100+
from DataFlow::Node nd, string msg
101+
where
102+
isUnanchoredHostnameRegExp(nd, msg) or
103+
isSemiAnchoredHostnameRegExp(nd, msg) or
104+
isLineAnchoredHostnameRegExp(nd, msg)
105+
select nd, msg
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
class UsersController < ActionController::Base
2+
def index
3+
# BAD: the host of `params[:url]` may be controlled by an attacker
4+
if params[:url].match? /https?:\/\/www\.example\.com\//
5+
redirect_to params[:url]
6+
end
7+
end
8+
end
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
class UsersController < ActionController::Base
2+
def index
3+
# GOOD: the host of `params[:url]` can not be controlled by an attacker
4+
if params[:url].match? /\Ahttps?:\/\/www\.example\.com\//
5+
redirect_to params[:url]
6+
end
7+
end
8+
end
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
| missing_regexp_anchor.rb:1:1:1:17 | /www.example.com/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
2+
| missing_regexp_anchor.rb:7:1:7:21 | /https?:\\/\\/good.com/ | When this is used as a regular expression on a URL, it may match anywhere, and arbitrary hosts may come before or after it. |
3+
| missing_regexp_anchor.rb:8:1:8:22 | /^https?:\\/\\/good.com/ | This hostname pattern may match any domain name, as it is missing a '\\z' or '/' at the end. |
4+
| missing_regexp_anchor.rb:8:1:8:22 | /^https?:\\/\\/good.com/ | This hostname pattern uses anchors such as '^' and '$', which match the start and end of a line, not the whole string. Use '\\A' and '\\z' instead. |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
queries/security/cwe-020/MissingRegExpAnchor.ql
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
/www.example.com/ # BAD
2+
/^www.example.com$/ # BAD: uses end-of-line anchors rather than end-of-string anchors
3+
/\Awww.example.com\z/ # GOOD
4+
5+
/foo.bar/ # GOOD
6+
7+
/https?:\/\/good.com/ # BAD
8+
/^https?:\/\/good.com/ # BAD: missing end-of-string anchor
9+
/(^https?:\/\/good1.com)|(^https?://good2.com)/ # BAD: missing end-of-string anchor
10+
11+
/bar/ # GOOD
12+
13+
foo.gsub(/www.example.com/, "bar") # GOOD
14+
foo.sub(/www.example.com/, "bar") # GOOD
15+
foo.gsub!(/www.example.com/, "bar") # GOOD
16+
foo.sub!(/www.example.com/, "bar") # GOOD
17+
18+

0 commit comments

Comments
 (0)