Skip to content

Commit 38ecfd8

Browse files
committed
Swift: Copy the query, qhelp, example from Ruby.
1 parent df58902 commit 38ecfd8

File tree

3 files changed

+84
-0
lines changed

3 files changed

+84
-0
lines changed
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
It is possible to match some single HTML tags using regular expressions (parsing general HTML using
9+
regular expressions is impossible). However, if the regular expression is not written well it might
10+
be possible to circumvent it, which can lead to cross-site scripting or other security issues.
11+
</p>
12+
<p>
13+
Some of these mistakes are caused by browsers having very forgiving HTML parsers, and
14+
will often render invalid HTML containing syntax errors.
15+
Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.
16+
</p>
17+
</overview>
18+
19+
<recommendation>
20+
<p>
21+
Use a well-tested sanitization or parser library if at all possible. These libraries are much more
22+
likely to handle corner cases correctly than a custom implementation.
23+
</p>
24+
</recommendation>
25+
26+
<example>
27+
<p>
28+
The following example attempts to filters out all <code>&lt;script&gt;</code> tags.
29+
</p>
30+
31+
<sample src="examples/BadTagFilter.rb" />
32+
33+
<p>
34+
The above sanitizer does not filter out all <code>&lt;script&gt;</code> tags.
35+
Browsers will not only accept <code>&lt;/script&gt;</code> as script end tags, but also tags such as <code>&lt;/script foo="bar"&gt;</code> even though it is a parser error.
36+
This means that an attack string such as <code>&lt;script&gt;alert(1)&lt;/script foo="bar"&gt;</code> will not be filtered by
37+
the function, and <code>alert(1)</code> will be executed by a browser if the string is rendered as HTML.
38+
</p>
39+
40+
<p>
41+
Other corner cases include that HTML comments can end with <code>--!&gt;</code>,
42+
and that HTML tag names can contain upper case characters.
43+
</p>
44+
</example>
45+
46+
<references>
47+
<li>Securitum: <a href="https://research.securitum.com/the-curious-case-of-copy-paste/">The Curious Case of Copy &amp; Paste</a>.</li>
48+
<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454">You can't parse [X]HTML with regex</a>.</li>
49+
<li>HTML Standard: <a href="https://html.spec.whatwg.org/multipage/parsing.html#comment-end-bang-state">Comment end bang state</a>.</li>
50+
<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/25559999/why-arent-browsers-strict-about-html">Why aren't browsers strict about HTML?</a>.</li>
51+
</references>
52+
</qhelp>
53+
54+
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/**
2+
* @name Bad HTML filtering regexp
3+
* @description Matching HTML tags using regular expressions is hard to do right, and can easily lead to security issues.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @security-severity 7.8
7+
* @precision high
8+
* @id rb/bad-tag-filter
9+
* @tags correctness
10+
* security
11+
* external/cwe/cwe-116
12+
* external/cwe/cwe-020
13+
* external/cwe/cwe-185
14+
* external/cwe/cwe-186
15+
*/
16+
17+
private import codeql.ruby.regexp.RegExpTreeView::RegexTreeView as TreeView
18+
import codeql.regex.nfa.BadTagFilterQuery::Make<TreeView>
19+
20+
from HtmlMatchingRegExp regexp, string msg
21+
where msg = min(string m | isBadRegexpFilter(regexp, m) | m order by m.length(), m) // there might be multiple, we arbitrarily pick the shortest one
22+
select regexp, msg
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
def filter_script_tags(html)
2+
old_html = ""
3+
while (html != old_html)
4+
old_html = html
5+
html = html.gsub(/<script[^>]*>.*<\/script>/m, "")
6+
end
7+
html
8+
end

0 commit comments

Comments
 (0)