Skip to content

Commit 7e7852e

Browse files
authored
Merge pull request #13641 from erik-krogh/multi-char
JS/RB: write qhelp for `incomplete-multi-character-sanitization`
2 parents b9acf1a + 6631e83 commit 7e7852e

File tree

4 files changed

+266
-30
lines changed

4 files changed

+266
-30
lines changed
Lines changed: 132 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,137 @@
11
<!DOCTYPE qhelp PUBLIC
2-
"-//Semmle//qhelp//EN"
3-
"qhelp.dtd">
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
44
<qhelp>
55

6-
<include src="IncompleteSanitization.qhelp" />
6+
<overview>
7+
<p>
8+
Sanitizing untrusted input is a common technique for preventing injection attacks and other security
9+
vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the
10+
regular expression matches multiple consecutive characters, replacing it just once
11+
can result in the unsafe text reappearing in the sanitized input.
12+
</p>
13+
<p>
14+
Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular
15+
expression, still contain malicious code or content. This can lead to code execution, data exposure,
16+
or other vulnerabilities.
17+
</p>
18+
</overview>
719

20+
<recommendation>
21+
<p>
22+
To prevent this issue, it is highly recommended to use a well-tested sanitization library whenever
23+
possible. These libraries are more likely to handle corner cases and ensure effective sanitization.
24+
</p>
25+
26+
<p>
27+
If a library is not an option, you can consider alternative strategies to fix the issue. For example,
28+
applying the regular expression replacement repeatedly until no more replacements can be performed, or rewriting the regular
29+
expression to match single characters instead of the entire unsafe text.
30+
</p>
31+
</recommendation>
32+
33+
<example>
34+
<p>
35+
Consider the following JavaScript code that aims to remove all HTML comment start and end tags:
36+
</p>
37+
38+
<sample language="javascript">
39+
str.replace(/&lt;!--|--!?&gt;/g, "");
40+
</sample>
41+
42+
<p>
43+
Given the input string "&lt;!&lt;!--- comment ---&gt;&gt;", the output will be "&lt;!-- comment --&gt;",
44+
which still contains an HTML comment.
45+
</p>
46+
47+
<p>
48+
One possible fix for this issue is to apply the regular expression replacement repeatedly until no
49+
more replacements can be performed. This ensures that the unsafe text does not re-appear in the sanitized input, effectively
50+
removing all instances of the targeted pattern:
51+
</p>
52+
53+
<sample language="javascript">
54+
function removeHtmlComments(input) {
55+
let previous;
56+
do {
57+
previous = input;
58+
input = input.replace(/&lt;!--|--!?&gt;/g, "");
59+
} while (input !== previous);
60+
return input;
61+
}
62+
</sample>
63+
</example>
64+
65+
<example>
66+
<p>
67+
Another example is the following regular expression intended to remove script tags:
68+
</p>
69+
70+
<sample language="javascript">
71+
str.replace(/&lt;script\b[^&lt;]*(?:(?!&lt;\/script&gt;)&lt;[^&lt;]*)*&lt;\/script&gt;/g, "");
72+
</sample>
73+
74+
<p>
75+
If the input string is "&lt;scrip&lt;script&gt;is removed&lt;/script&gt;t&gt;alert(123)&lt;/script&gt;",
76+
the output will be "&lt;script&gt;alert(123)&lt;/script&gt;", which still contains a script tag.
77+
</p>
78+
<p>
79+
A fix for this issue is to rewrite the regular expression to match single characters
80+
("&lt;" and "&gt;") instead of the entire unsafe text. This simplifies the sanitization process
81+
and ensures that all potentially unsafe characters are removed:
82+
</p>
83+
<sample language="javascript">
84+
function removeAllHtmlTags(input) {
85+
return input.replace(/&lt;|&gt;/g, "");
86+
}
87+
</sample>
88+
<p>
89+
Another potential fix is to use the popular <code>sanitize-html</code> npm library.
90+
It keeps most of the safe HTML tags while removing all unsafe tags and attributes.
91+
</p>
92+
<sample language="javascript">
93+
const sanitizeHtml = require("sanitize-html");
94+
function removeAllHtmlTags(input) {
95+
return sanitizeHtml(input);
96+
}
97+
</sample>
98+
99+
</example>
100+
101+
<example>
102+
<p>
103+
Lastly, consider a path sanitizer using the regular expression <code>/\.\.\//</code>:
104+
</p>
105+
106+
<sample language="javascript">
107+
str.replace(/\.\.\//g, "");
108+
</sample>
109+
110+
<p>
111+
The regular expression attempts to strip out all occurences of <code>/../</code> from <code>str</code>.
112+
This will not work as expected: for the string <code>/./.././</code>, for example, it will remove the single
113+
occurrence of <code>/../</code> in the middle, but the remainder of the string then becomes
114+
<code>/../</code>, which is another instance of the substring we were trying to remove.
115+
</p>
116+
117+
<p>
118+
A possible fix for this issue is to use the "sanitize-filename" npm library for path sanitization.
119+
This library is specifically designed to handle path sanitization, and should handle all corner cases
120+
and ensure effective sanitization:
121+
</p>
122+
123+
<sample language="javascript">
124+
const sanitize = require("sanitize-filename");
125+
126+
function sanitizePath(input) {
127+
return sanitize(input);
128+
}
129+
</sample>
130+
131+
</example>
132+
133+
<references>
134+
<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li>
135+
<li>Stack Overflow: <a href="https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression">Removing all script tags from HTML with JS regular expression</a>.</li>
136+
</references>
8137
</qhelp>

javascript/ql/src/Security/CWE-116/IncompleteSanitization.qhelp

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -43,18 +43,6 @@ needed, for instance by using prepared statements for SQL queries.
4343
Otherwise, make sure to use a regular expression with the <code>g</code> flag to ensure that
4444
all occurrences are replaced, and remember to escape backslashes if applicable.
4545
</p>
46-
<p>
47-
Note, however, that this is generally <i>not</i> sufficient for replacing multi-character strings:
48-
the <code>String.prototype.replace</code> method only performs one pass over the input string,
49-
and will not replace further instances of the string that result from earlier replacements.
50-
</p>
51-
<p>
52-
For example, consider the code snippet <code>s.replace(/\/\.\.\//g, "")</code>, which attempts
53-
to strip out all occurences of <code>/../</code> from <code>s</code>. This will not work as
54-
expected: for the string <code>/./.././</code>, for example, it will remove the single
55-
occurrence of <code>/../</code> in the middle, but the remainder of the string then becomes
56-
<code>/../</code>, which is another instance of the substring we were trying to remove.
57-
</p>
5846
</recommendation>
5947

6048
<example>
Lines changed: 134 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,139 @@
11
<!DOCTYPE qhelp PUBLIC
2-
"-//Semmle//qhelp//EN"
3-
"qhelp.dtd">
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
44
<qhelp>
55

6-
<include src="IncompleteSanitization.qhelp" />
6+
<overview>
7+
<p>
8+
Sanitizing untrusted input is a common technique for preventing injection attacks and other security
9+
vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the
10+
regular expression matches multiple consecutive characters, replacing it just once
11+
can result in the unsafe text re-appearing in the sanitized input.
12+
</p>
13+
<p>
14+
Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular
15+
expression, still contain malicious code or content. This can lead to code execution, data exposure,
16+
or other vulnerabilities.
17+
</p>
18+
</overview>
719

20+
<recommendation>
21+
<p>
22+
To prevent this issue, it is highly recommended to use a well-tested sanitization library whenever
23+
possible. These libraries are more likely to handle corner cases and ensure effective sanitization.
24+
</p>
25+
26+
<p>
27+
If a library is not an option, you can consider alternative strategies to fix the issue. For example,
28+
applying the regular expression replacement repeatedly until no more replacements can be performed, or rewriting the regular
29+
expression to match single characters instead of the entire unsafe text.
30+
</p>
31+
</recommendation>
32+
33+
<example>
34+
<p>
35+
Consider the following Ruby code that aims to remove all HTML comment start and end tags:
36+
</p>
37+
38+
<sample language="ruby">
39+
str.gsub(/&lt;!--|--!?&gt;/, "")
40+
</sample>
41+
42+
<p>
43+
Given the input string "&lt;!&lt;!--- comment ---&gt;&gt;", the output will be "&lt;!-- comment --&gt;",
44+
which still contains an HTML comment.
45+
</p>
46+
47+
<p>
48+
One possible fix for this issue is to apply the regular expression replacement repeatedly until no
49+
more replacements can be performed. This ensures that the unsafe text does not re-appear in the sanitized input, effectively
50+
removing all instances of the targeted pattern:
51+
</p>
52+
53+
<sample language="ruby">
54+
def remove_html_comments(input)
55+
previous = nil
56+
while input != previous
57+
previous = input
58+
input = input.gsub(/&lt;!--|--!?&gt;/, "")
59+
end
60+
input
61+
end
62+
</sample>
63+
</example>
64+
65+
<example>
66+
<p>
67+
Another example is the following regular expression intended to remove script tags:
68+
</p>
69+
70+
<sample language="ruby">
71+
str.gsub(/&lt;script\b[^&lt;]*(?:(?!&lt;\/script&gt;)&lt;[^&lt;]*)*&lt;\/script&gt;/, "")
72+
</sample>
73+
74+
<p>
75+
If the input string is "&lt;scrip&lt;script&gt;is removed&lt;/script&gt;t&gt;alert(123)&lt;/script&gt;",
76+
the output will be "&lt;script&gt;alert(123)&lt;/script&gt;", which still contains a script tag.
77+
</p>
78+
<p>
79+
A fix for this issue is to rewrite the regular expression to match single characters
80+
("&lt;" and "&gt;") instead of the entire unsafe text. This simplifies the sanitization process
81+
and ensures that all potentially unsafe characters are removed:
82+
</p>
83+
<sample language="ruby">
84+
def remove_all_html_tags(input)
85+
input.gsub(/&lt;|&gt;/, "")
86+
end
87+
</sample>
88+
89+
<p>
90+
Another potential fix is to use the popular <code>sanitize</code> gem.
91+
It keeps most of the safe HTML tags while removing all unsafe tags and attributes.
92+
</p>
93+
<sample language="javascript">
94+
require 'sanitize'
95+
96+
def sanitize_html(input)
97+
Sanitize.fragment(input)
98+
end
99+
</sample>
100+
101+
</example>
102+
103+
<example>
104+
<p>
105+
Lastly, consider a path sanitizer using the regular expression <code>/\.\.\//</code>:
106+
</p>
107+
108+
<sample language="ruby">
109+
str.gsub(/\.\.\//, "")
110+
</sample>
111+
112+
<p>
113+
The regular expression attempts to strip out all occurences of <code>/../</code> from <code>str</code>.
114+
This will not work as expected: for the string <code>/./.././</code>, for example, it will remove the single
115+
occurrence of <code>/../</code> in the middle, but the remainder of the string then becomes
116+
<code>/../</code>, which is another instance of the substring we were trying to remove.
117+
</p>
118+
119+
<p>
120+
A possible fix for this issue is to use the <code>File.sanitize</code> function from the Ruby Facets gem.
121+
This library is specifically designed to handle path sanitization, and should handle all corner cases
122+
and ensure effective sanitization:
123+
</p>
124+
125+
<sample language="ruby">
126+
require 'facets'
127+
128+
def sanitize_path(input)
129+
File.sanitize(input)
130+
end
131+
</sample>
132+
133+
</example>
134+
135+
<references>
136+
<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li>
137+
<li>Stack Overflow: <a href="https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression">Removing all script tags from HTML with JS regular expression</a>.</li>
138+
</references>
8139
</qhelp>

ruby/ql/src/queries/security/cwe-116/IncompleteSanitization.qhelp

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -37,18 +37,6 @@ An even safer alternative is to design the application so that sanitization is n
3737
Otherwise, make sure to use <code>String#gsub</code> rather than <code>String#sub</code>, to ensure
3838
that all occurrences are replaced, and remember to escape backslashes if applicable.
3939
</p>
40-
<p>
41-
Note, however, that this is generally <i>not</i> sufficient for replacing multi-character strings:
42-
the <code>String#gsub</code> method performs only one pass over the input string, and will not
43-
replace further instances of the string that result from earlier replacements.
44-
</p>
45-
<p>
46-
For example, consider the code snippet <code>s.gsub /\/\.\.\//, ""</code>, which attempts to strip
47-
out all occurrences of <code>/../</code> from <code>s</code>. This will not work as expected: for the
48-
string <code>/./.././</code>, for example, it will remove the single occurrence of <code>/../</code>
49-
in the middle, but the remainder of the string then becomes <code>/../</code>, which is another
50-
instance of the substring we were trying to remove.
51-
</p>
5240
</recommendation>
5341

5442
<example>

0 commit comments

Comments
 (0)