Skip to content

Commit a60478b

Browse files
committed
write qhelp for js/incomplete-multi-character-sanitization
1 parent 95ddc01 commit a60478b

File tree

2 files changed

+238
-6
lines changed

2 files changed

+238
-6
lines changed
Lines changed: 119 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,124 @@
11
<!DOCTYPE qhelp PUBLIC
2-
"-//Semmle//qhelp//EN"
3-
"qhelp.dtd">
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
44
<qhelp>
55

6-
<include src="IncompleteSanitization.qhelp" />
6+
<overview>
7+
<p>
8+
Sanitizing untrusted input is a common technique for preventing injection attacks and other security
9+
vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the
10+
regex matches multiple consecutive characters, applying a regular expression replacement just once
11+
can result in the unsafe text re-appearing in the sanitized input.
12+
</p>
13+
<p>
14+
Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular
15+
expression, still contain malicious code or content. This can lead to code execution, data exposure,
16+
or other vulnerabilities.
17+
</p>
18+
</overview>
719

20+
<recommendation>
21+
<p>
22+
To prevent this issue, it is highly recommended to use a well-tested sanitization library whenever
23+
possible. These libraries are more likely to handle corner cases and ensure effective sanitization.
24+
</p>
25+
26+
<p>
27+
If a library is not an option, you can consider alternative strategies to fix the issue. For example,
28+
applying the regular expression replacement recursively until a fixpoint is reached or to rewrite the regular
29+
expression to match single characters instead of the entire unsafe text.
30+
</p>
31+
</recommendation>
32+
33+
<example>
34+
<p>
35+
Consider the following JavaScript code that aims to remove all HTML comment start and end tags:
36+
</p>
37+
38+
<sample language="javascript">
39+
str.replace(/&lt;!--|--!?&gt;/g, "");
40+
</sample>
41+
42+
<p>
43+
Given the input string "&lt;!&lt;!--- comment ---&gt;&gt;", the output will be "&lt;!-- comment --&gt;",
44+
which still contains an HTML comment.
45+
</p>
46+
47+
<p>
48+
One possible fix for this issue is to apply the regular expression replacement recursively until a fixpoint
49+
is reached. This ensures that the unsafe text does not re-appear in the sanitized input, effectively
50+
removing all instances of the targeted pattern:
51+
</p>
52+
53+
<sample language="javascript">
54+
function removeHtmlComments(input) {
55+
let previous;
56+
do {
57+
previous = input;
58+
input = input.replace(/&lt;!--|--!?&gt;/g, "");
59+
} while (input !== previous);
60+
return input;
61+
}
62+
</sample>
63+
</example>
64+
65+
<example>
66+
<p>
67+
Another example is the following regular expression intended to remove script tags:
68+
</p>
69+
70+
<sample language="javascript">
71+
str.replace(/&lt;script\b[^&lt;]*(?:(?!&lt;\/script&gt;)&lt;[^&lt;]*)*&lt;\/script&gt;/g, "");
72+
</sample>
73+
74+
<p>
75+
If the input string is "&lt;scrip&lt;script&gt;is removed&lt;/script&gt;t&gt;alert(123)&lt;/script&gt;",
76+
the output will be "&lt;script&gt;alert(123)&lt;/script&gt;", which still contains a script tag.
77+
</p>
78+
<p>
79+
A fix for this issue is to rewrite the regular expression to match single characters
80+
("&lt;" and "&gt;") instead of the entire unsafe text. This simplifies the sanitization process
81+
and ensures that all potentially unsafe characters are removed:
82+
</p>
83+
<sample language="javascript">
84+
function removeAllHtmlTags(input) {
85+
return input.replace(/&lt;|&gt;/g, "");
86+
}
87+
</sample>
88+
89+
</example>
90+
91+
<example>
92+
<p>
93+
Lastly, consider a path sanitizer using the regex <code>/\.\.\//</code>:
94+
</p>
95+
96+
<sample language="javascript">
97+
str.replace(/\.\.\//g, "");
98+
</sample>
99+
100+
<p>
101+
This can result in an unsafe path being generated if only a single replacement is done.
102+
</p>
103+
104+
<p>
105+
A possible fix for this issue is to use the "sanitize-filename" npm library for path sanitization.
106+
This library is specifically designed to handle path sanitization, and should handle all corner cases
107+
and ensure effective sanitization:
108+
</p>
109+
110+
<sample language="javascript">
111+
const sanitize = require("sanitize-filename");
112+
113+
function sanitizePath(input) {
114+
return sanitize(input);
115+
}
116+
</sample>
117+
118+
</example>
119+
120+
<references>
121+
<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li>
122+
<li>Stackoverflow: <a href="https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression">Removing all script tags from HTML with JS regular expression</a>.</li>
123+
</references>
8124
</qhelp>
Lines changed: 119 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,124 @@
11
<!DOCTYPE qhelp PUBLIC
2-
"-//Semmle//qhelp//EN"
3-
"qhelp.dtd">
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
44
<qhelp>
55

6-
<include src="IncompleteSanitization.qhelp" />
6+
<overview>
7+
<p>
8+
Sanitizing untrusted input is a common technique for preventing injection attacks and other security
9+
vulnerabilities. Regular expressions are often used to perform this sanitization. However, when the
10+
regex matches multiple consecutive characters, applying a regular expression replacement just once
11+
can result in the unsafe text re-appearing in the sanitized input.
12+
</p>
13+
<p>
14+
Attackers can exploit this issue by crafting inputs that, when sanitized with an ineffective regular
15+
expression, still contain malicious code or content. This can lead to code execution, data exposure,
16+
or other vulnerabilities.
17+
</p>
18+
</overview>
719

20+
<recommendation>
21+
<p>
22+
To prevent this issue, it is highly recommended to use a well-tested sanitization library whenever
23+
possible. These libraries are more likely to handle corner cases and ensure effective sanitization.
24+
</p>
25+
26+
<p>
27+
If a library is not an option, you can consider alternative strategies to fix the issue. For example,
28+
applying the regular expression replacement recursively until a fixpoint is reached or to rewrite the regular
29+
expression to match single characters instead of the entire unsafe text.
30+
</p>
31+
</recommendation>
32+
33+
<example>
34+
<p>
35+
Consider the following Ruby code that aims to remove all HTML comment start and end tags:
36+
</p>
37+
38+
<sample language="ruby">
39+
str.gsub(/&lt;!--|--!?&gt;/, "")
40+
</sample>
41+
42+
<p>
43+
Given the input string "&lt;!&lt;!--- comment ---&gt;&gt;", the output will be "&lt;!-- comment --&gt;",
44+
which still contains an HTML comment.
45+
</p>
46+
47+
<p>
48+
One possible fix for this issue is to apply the regular expression replacement recursively until a fixpoint
49+
is reached. This ensures that the unsafe text does not re-appear in the sanitized input, effectively
50+
removing all instances of the targeted pattern:
51+
</p>
52+
53+
<sample language="ruby">
54+
def remove_html_comments(input)
55+
previous = nil
56+
while input != previous
57+
previous = input
58+
input = input.gsub(/&lt;!--|--!?&gt;/, "")
59+
end
60+
input
61+
end
62+
</sample>
63+
</example>
64+
65+
<example>
66+
<p>
67+
Another example is the following regular expression intended to remove script tags:
68+
</p>
69+
70+
<sample language="ruby">
71+
str.gsub(/&lt;script\b[^&lt;]*(?:(?!&lt;\/script&gt;)&lt;[^&lt;]*)*&lt;\/script&gt;/, "")
72+
</sample>
73+
74+
<p>
75+
If the input string is "&lt;scrip&lt;script&gt;is removed&lt;/script&gt;t&gt;alert(123)&lt;/script&gt;",
76+
the output will be "&lt;script&gt;alert(123)&lt;/script&gt;", which still contains a script tag.
77+
</p>
78+
<p>
79+
A fix for this issue is to rewrite the regular expression to match single characters
80+
("&lt;" and "&gt;") instead of the entire unsafe text. This simplifies the sanitization process
81+
and ensures that all potentially unsafe characters are removed:
82+
</p>
83+
<sample language="ruby">
84+
def remove_all_html_tags(input)
85+
input.gsub(/&lt;|&gt;/, "")
86+
end
87+
</sample>
88+
89+
</example>
90+
91+
<example>
92+
<p>
93+
Lastly, consider a path sanitizer using the regex <code>/\.\.\//</code>:
94+
</p>
95+
96+
<sample language="ruby">
97+
str.gsub(/\.\.\//, "")
98+
</sample>
99+
100+
<p>
101+
This can result in an unsafe path being generated if only a single replacement is done.
102+
</p>
103+
104+
<p>
105+
A possible fix for this issue is to use the <code>File.sanitize</code> function from the Ruby Facets gem.
106+
This library is specifically designed to handle path sanitization, and should handle all corner cases
107+
and ensure effective sanitization:
108+
</p>
109+
110+
<sample language="ruby">
111+
require 'facets'
112+
113+
def sanitize_path(input)
114+
File.sanitize(input)
115+
end
116+
</sample>
117+
118+
</example>
119+
120+
<references>
121+
<li>OWASP Top 10: <a href="https://www.owasp.org/index.php/Top_10-2017_A1-Injection">A1 Injection</a>.</li>
122+
<li>Stackoverflow: <a href="https://stackoverflow.com/questions/6659351/removing-all-script-tags-from-html-with-js-regular-expression">Removing all script tags from HTML with JS regular expression</a>.</li>
123+
</references>
8124
</qhelp>

0 commit comments

Comments
 (0)