Skip to content

Commit 27c12de

Browse files
authored
Merge pull request github#13549 from geoffw0/badfilter
Swift: Query for bad HTML filtering regexps
2 parents 2582b08 + 26d4f9f commit 27c12de

File tree

7 files changed

+348
-0
lines changed

7 files changed

+348
-0
lines changed
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
category: newQuery
3+
---
4+
* Added new query "Bad HTML filtering regexp" (`swift/bad-tag-filter`). This query finds regular expressions that match HTML tags in a way that is not robust and can easily lead to security issues.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
<!DOCTYPE qhelp PUBLIC
2+
"-//Semmle//qhelp//EN"
3+
"qhelp.dtd">
4+
<qhelp>
5+
6+
<overview>
7+
<p>
8+
It is possible to match some single HTML tags using regular expressions (parsing general HTML using
9+
regular expressions is impossible). However, if the regular expression is not written well, it might
10+
be possible to circumvent it. This can lead to cross-site scripting or other security issues.
11+
</p>
12+
<p>
13+
Some of these mistakes are caused by browsers having very forgiving HTML parsers, and
14+
will often render invalid HTML containing syntax errors.
15+
Regular expressions that attempt to match HTML should also recognize tags containing such syntax errors.
16+
</p>
17+
</overview>
18+
19+
<recommendation>
20+
<p>
21+
Use a well-tested sanitization or parser library if at all possible. These libraries are much more
22+
likely to handle corner cases correctly than a custom implementation.
23+
</p>
24+
</recommendation>
25+
26+
<example>
27+
<p>
28+
The following example attempts to filters out all <code>&lt;script&gt;</code> tags.
29+
</p>
30+
31+
<sample src="BadTagFilterBad.swift" />
32+
33+
<p>
34+
The above sanitizer does not filter out all <code>&lt;script&gt;</code> tags.
35+
Browsers will not only accept <code>&lt;/script&gt;</code> as script end tags, but also tags such as <code>&lt;/script foo="bar"&gt;</code> even though it is a parser error.
36+
This means that an attack string such as <code>&lt;script&gt;alert(1)&lt;/script foo="bar"&gt;</code> will not be filtered by
37+
the function, and <code>alert(1)</code> will be executed by a browser if the string is rendered as HTML.
38+
</p>
39+
40+
<p>
41+
Other corner cases include HTML comments ending with <code>--!&gt;</code>,
42+
and HTML tag names containing uppercase characters.
43+
</p>
44+
</example>
45+
46+
<references>
47+
<li>Securitum: <a href="https://research.securitum.com/the-curious-case-of-copy-paste/">The Curious Case of Copy &amp; Paste</a>.</li>
48+
<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#answer-1732454">You can't parse [X]HTML with regex</a>.</li>
49+
<li>HTML Standard: <a href="https://html.spec.whatwg.org/multipage/parsing.html#comment-end-bang-state">Comment end bang state</a>.</li>
50+
<li>stackoverflow.com: <a href="https://stackoverflow.com/questions/25559999/why-arent-browsers-strict-about-html">Why aren't browsers strict about HTML?</a></li>
51+
</references>
52+
</qhelp>
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
/**
2+
* @name Bad HTML filtering regexp
3+
* @description Matching HTML tags using regular expressions is hard to do right, and can lead to security issues.
4+
* @kind problem
5+
* @problem.severity warning
6+
* @security-severity 7.8
7+
* @precision high
8+
* @id swift/bad-tag-filter
9+
* @tags correctness
10+
* security
11+
* external/cwe/cwe-116
12+
* external/cwe/cwe-020
13+
* external/cwe/cwe-185
14+
* external/cwe/cwe-186
15+
*/
16+
17+
import codeql.swift.regex.Regex
18+
private import codeql.swift.regex.RegexTreeView::RegexTreeView as TreeView
19+
import codeql.regex.nfa.BadTagFilterQuery::Make<TreeView>
20+
21+
from HtmlMatchingRegExp regexp, string msg
22+
where
23+
// there might be multiple messages, we arbitrarily pick the shortest one
24+
msg = min(string m | isBadRegexpFilter(regexp, m) | m order by m.length(), m)
25+
select regexp, msg
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
let script_tag_regex = /<script[^>]*>.*<\/script>/
2+
3+
var old_html = ""
4+
while (html != old_html) {
5+
old_html = html
6+
html.replace(script_tag_regex, with: "")
7+
}
8+
9+
...
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
| test.swift:79:26:79:48 | <script.*?>.*?<\\/script> | This regular expression does not match script end tags like </script >. |
2+
| test.swift:86:27:86:49 | <script.*?>.*?<\\/script> | This regular expression does not match script end tags like </script >. |
3+
| test.swift:90:50:90:72 | <script.*?>.*?<\\/script> | This regular expression does not match script end tags like </script >. |
4+
| test.swift:113:26:113:35 | <!--.*--!?> | This regular expression does not match comments containing newlines. |
5+
| test.swift:117:26:117:58 | <script.*?>(.\|\\s)*?<\\/script[^>]*> | This regular expression matches <script></script>, but not <script \\n></script> |
6+
| test.swift:121:26:121:56 | <script[^>]*?>.*?<\\/script[^>]*> | This regular expression matches <script>...</script>, but not <script >...\\n</script> |
7+
| test.swift:125:26:125:63 | <script(\\s\|\\w\|=\|")*?>.*?<\\/script[^>]*> | This regular expression does not match script tags where the attribute uses single-quotes. |
8+
| test.swift:132:28:132:65 | <script(\\s\|\\w\|=\|')*?>.*?<\\/script[^>]*> | This regular expression does not match script tags where the attribute uses double-quotes. |
9+
| test.swift:136:50:136:87 | <script(\\s\|\\w\|=\|')*?>.*?<\\/script[^>]*> | This regular expression does not match script tags where the attribute uses double-quotes. |
10+
| test.swift:143:28:143:69 | <script( \|\\n\|\\w\|=\|'\|")*?>.*?<\\/script[^>]*> | This regular expression does not match script tags where tabs are used between attributes. |
11+
| test.swift:147:50:147:91 | <script( \|\\n\|\\w\|=\|'\|")*?>.*?<\\/script[^>]*> | This regular expression does not match script tags where tabs are used between attributes. |
12+
| test.swift:154:28:154:55 | <script.*?>.*?<\\/script[^>]*> | This regular expression does not match upper case <SCRIPT> tags. |
13+
| test.swift:157:50:157:77 | <script.*?>.*?<\\/script[^>]*> | This regular expression does not match upper case <SCRIPT> tags. |
14+
| test.swift:164:28:164:73 | <(script\|SCRIPT).*?>.*?<\\/(script\|SCRIPT)[^>]*> | This regular expression does not match mixed case <sCrIpT> tags. |
15+
| test.swift:167:50:167:95 | <(script\|SCRIPT).*?>.*?<\\/(script\|SCRIPT)[^>]*> | This regular expression does not match mixed case <sCrIpT> tags. |
16+
| test.swift:174:28:174:60 | <script[^>]*?>[\\s\\S]*?<\\/script.*> | This regular expression does not match script end tags like </script\\t\\n bar>. |
17+
| test.swift:177:50:177:82 | <script[^>]*?>[\\s\\S]*?<\\/script.*> | This regular expression does not match script end tags like </script\\t\\n bar>. |
18+
| test.swift:191:27:191:68 | <(?:!--([\\S\|\\s]*?)-->)\|([^\\/\\s>]+)[\\S\\s]*?> | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 1 and comments ending with --!> are matched with capture group 2. |
19+
| test.swift:194:50:194:91 | <(?:!--([\\S\|\\s]*?)-->)\|([^\\/\\s>]+)[\\S\\s]*?> | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 1 and comments ending with --!> are matched with capture group 2. |
20+
| test.swift:198:27:198:167 | <(?:(?:\\/([^>]+)>)\|(?:!--([\\S\|\\s]*?)-->)\|(?:([^\\/\\s>]+)((?:\\s+[\\w\\-:.]+(?:\\s*=\\s*?(?:(?:"[^"]*")\|(?:'[^']*')\|[^\\s"'\\/>]+))?)*)[\\S\\s]*?(\\/?)>)) | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 2 and comments ending with --!> are matched with capture group 3, 4. |
21+
| test.swift:201:50:201:190 | <(?:(?:\\/([^>]+)>)\|(?:!--([\\S\|\\s]*?)-->)\|(?:([^\\/\\s>]+)((?:\\s+[\\w\\-:.]+(?:\\s*=\\s*?(?:(?:"[^"]*")\|(?:'[^']*')\|[^\\s"'\\/>]+))?)*)[\\S\\s]*?(\\/?)>)) | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 2 and comments ending with --!> are matched with capture group 3, 4. |
22+
| test.swift:205:51:205:84 | <script\\b[^>]*>([\\s\\S]*?)<\\/script> | This regular expression does not match script end tags like </script >. |
23+
| test.swift:209:51:209:104 | (<[a-z\\/!$]("[^"]*"\|'[^']*'\|[^'">])*>\|<!(--.*?--\\s*)+>) | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 3 and comments ending with --!> are matched with capture group 1. |
24+
| test.swift:213:51:213:293 | <(?:(?:!--([\\w\\W]*?)-->)\|(?:!\\[CDATA\\[([\\w\\W]*?)\\]\\]>)\|(?:!DOCTYPE([\\w\\W]*?)>)\|(?:\\?([^\\s\\/<>]+) ?([\\w\\W]*?)[?/]>)\|(?:\\/([A-Za-z][A-Za-z0-9\\-_\\:\\.]*)>)\|(?:([A-Za-z][A-Za-z0-9\\-_\\:\\.]*)((?:\\s+[^"'>]+(?:(?:"[^"]*")\|(?:'[^']*')\|[^>]*))*\|\\/\|\\s+)>)) | This regular expression only parses --> (capture group 1) and not --!> as an HTML comment end tag. |
25+
| test.swift:217:51:217:77 | <!--([\\w\\W]*?)-->\|<([^>]*?)> | Comments ending with --> are matched differently from comments ending with --!>. The first is matched with capture group 1 and comments ending with --!> are matched with capture group 2. |
26+
| test.swift:225:51:225:52 | --> | This regular expression only parses --> and not --!> as a HTML comment end tag. |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
queries/Security/CWE-116/BadTagFilter.ql
Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
2+
// --- stubs ---
3+
4+
struct URL {
5+
init?(string: String) {}
6+
}
7+
8+
extension String {
9+
init(contentsOf: URL) {
10+
let data = ""
11+
self.init(data)
12+
}
13+
}
14+
15+
struct AnyRegexOutput {
16+
}
17+
18+
protocol RegexComponent<RegexOutput> {
19+
associatedtype RegexOutput
20+
}
21+
22+
struct Regex<Output> : RegexComponent {
23+
struct Match {
24+
}
25+
26+
init(_ pattern: String) throws where Output == AnyRegexOutput { }
27+
28+
func ignoresCase(_ ignoresCase: Bool = true) -> Regex<Regex<Output>.RegexOutput> { return self }
29+
func dotMatchesNewlines(_ dotMatchesNewlines: Bool = true) -> Regex<Regex<Output>.RegexOutput> { return self }
30+
31+
func firstMatch(in string: String) throws -> Regex<Output>.Match? { return nil}
32+
33+
typealias RegexOutput = Output
34+
}
35+
36+
extension String : RegexComponent {
37+
typealias Output = Substring
38+
typealias RegexOutput = String.Output
39+
}
40+
41+
class NSObject {
42+
}
43+
44+
struct _NSRange {
45+
init(location: Int, length: Int) { }
46+
}
47+
48+
typealias NSRange = _NSRange
49+
50+
func NSMakeRange(_ loc: Int, _ len: Int) -> NSRange { return NSRange(location: loc, length: len) }
51+
52+
class NSTextCheckingResult : NSObject {
53+
}
54+
55+
class NSRegularExpression : NSObject {
56+
struct Options : OptionSet {
57+
var rawValue: UInt
58+
59+
static var caseInsensitive: NSRegularExpression.Options { get { return Options(rawValue: 1 << 0) } }
60+
static var dotMatchesLineSeparators: NSRegularExpression.Options { get { return Options(rawValue: 1 << 1) } }
61+
}
62+
63+
struct MatchingOptions : OptionSet {
64+
var rawValue: UInt
65+
}
66+
67+
init(pattern: String, options: NSRegularExpression.Options = []) throws { }
68+
69+
func matches(in string: String, options: NSRegularExpression.MatchingOptions = [], range: NSRange) -> [NSTextCheckingResult] { return [] }
70+
func firstMatch(in string: String, options: NSRegularExpression.MatchingOptions = [], range: NSRange) -> NSTextCheckingResult? { return nil }
71+
}
72+
73+
// --- tests ---
74+
75+
func myRegexpVariantsTests(myUrl: URL) throws {
76+
let tainted = String(contentsOf: myUrl) // tainted
77+
78+
// BAD - doesn't match newlines or `</script >`
79+
let re1 = try Regex(#"<script.*?>.*?<\/script>"#).ignoresCase(true)
80+
_ = try re1.firstMatch(in: tainted)
81+
82+
// BAD - doesn't match `</script >` [NOT DETECTED - all regexs with mode flags are currently missed by the query]
83+
let re2a = try Regex(#"(?is)<script.*?>.*?<\/script>"#)
84+
_ = try re2a.firstMatch(in: tainted)
85+
// BAD - doesn't match `</script >`
86+
let re2b = try Regex(#"<script.*?>.*?<\/script>"#).ignoresCase(true).dotMatchesNewlines(true)
87+
_ = try re2b.firstMatch(in: tainted)
88+
// BAD - doesn't match `</script >`
89+
let options2c: NSRegularExpression.Options = [.caseInsensitive, .dotMatchesLineSeparators]
90+
let ns2c = try NSRegularExpression(pattern: #"<script.*?>.*?<\/script>"#, options: options2c)
91+
_ = ns2c.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
92+
93+
// GOOD
94+
let re3a = try Regex(#"(?is)<script.*?>.*?<\/script[^>]*>"#)
95+
_ = try re3a.firstMatch(in: tainted)
96+
// GOOD
97+
let re3b = try Regex(#"<script.*?>.*?<\/script[^>]*>"#).ignoresCase(true).dotMatchesNewlines(true)
98+
_ = try re3b.firstMatch(in: tainted)
99+
// GOOD
100+
let options3b: NSRegularExpression.Options = [.caseInsensitive, .dotMatchesLineSeparators]
101+
let ns3b = try NSRegularExpression(pattern: #"<script.*?>.*?<\/script[^>]*>"#, options: options3b)
102+
_ = ns3b.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
103+
104+
// GOOD - we don't care regexps that only match comments
105+
let re4 = try Regex(#"<!--.*-->"#).ignoresCase(true).dotMatchesNewlines(true)
106+
_ = try re4.firstMatch(in: tainted)
107+
108+
// GOOD
109+
let re5 = try Regex(#"<!--.*--!?>"#).ignoresCase(true).dotMatchesNewlines(true)
110+
_ = try re5.firstMatch(in: tainted)
111+
112+
// BAD, does not match newlines
113+
let re6 = try Regex(#"<!--.*--!?>"#).ignoresCase(true)
114+
_ = try re6.firstMatch(in: tainted)
115+
116+
// BAD - doesn't match newlines inside the script tag
117+
let re7 = try Regex(#"<script.*?>(.|\s)*?<\/script[^>]*>"#).ignoresCase(true)
118+
_ = try re7.firstMatch(in: tainted)
119+
120+
// BAD - doesn't match newlines inside the content
121+
let re8 = try Regex(#"<script[^>]*?>.*?<\/script[^>]*>"#).ignoresCase(true)
122+
_ = try re8.firstMatch(in: tainted)
123+
124+
// BAD - does not match single quotes for attribute values
125+
let re9 = try Regex(#"<script(\s|\w|=|")*?>.*?<\/script[^>]*>"#).ignoresCase(true).dotMatchesNewlines(true)
126+
_ = try re9.firstMatch(in: tainted)
127+
128+
// BAD - does not match double quotes for attribute values [NOT DETECTED]
129+
let re10a = try Regex(#"(?is)<script(\s|\w|=|')*?>.*?<\/script[^>]*>"#)
130+
_ = try re10a.firstMatch(in: tainted)
131+
// BAD - does not match double quotes for attribute values
132+
let re10b = try Regex(#"<script(\s|\w|=|')*?>.*?<\/script[^>]*>"#).ignoresCase(true).dotMatchesNewlines(true)
133+
_ = try re10b.firstMatch(in: tainted)
134+
// BAD - does not match double quotes for attribute values
135+
let options10: NSRegularExpression.Options = [.caseInsensitive, .dotMatchesLineSeparators]
136+
let ns10 = try NSRegularExpression(pattern: #"<script(\s|\w|=|')*?>.*?<\/script[^>]*>"#, options: options10)
137+
_ = ns10.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
138+
139+
// BAD - does not match tabs between attributes [NOT DETECTED]
140+
let re11a = try Regex(#"(?is)<script( |\n|\w|=|'|")*?>.*?<\/script[^>]*>"#)
141+
_ = try re11a.firstMatch(in: tainted)
142+
// BAD - does not match tabs between attributes
143+
let re11b = try Regex(#"<script( |\n|\w|=|'|")*?>.*?<\/script[^>]*>"#).ignoresCase(true).dotMatchesNewlines(true)
144+
_ = try re11b.firstMatch(in: tainted)
145+
// BAD - does not match tabs between attributes
146+
let options11: NSRegularExpression.Options = [.caseInsensitive, .dotMatchesLineSeparators]
147+
let ns11 = try NSRegularExpression(pattern: #"<script( |\n|\w|=|'|")*?>.*?<\/script[^>]*>"#, options: options11)
148+
_ = ns11.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
149+
150+
// BAD - does not match uppercase SCRIPT tags [NOT DETECTED]
151+
let re12a = try Regex(#"(?s)<script.*?>.*?<\/script[^>]*>"#)
152+
_ = try re12a.firstMatch(in: tainted)
153+
// BAD - does not match uppercase SCRIPT tags
154+
let re12b = try Regex(#"<script.*?>.*?<\/script[^>]*>"#).dotMatchesNewlines(true)
155+
_ = try re12b.firstMatch(in: tainted)
156+
// BAD - does not match uppercase SCRIPT tags
157+
let ns12 = try NSRegularExpression(pattern: #"<script.*?>.*?<\/script[^>]*>"#, options: .dotMatchesLineSeparators)
158+
_ = ns12.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
159+
160+
// BAD - does not match mixed case script tags [NOT DETECTED]
161+
let re13a = try Regex(#"(?s)<(script|SCRIPT).*?>.*?<\/(script|SCRIPT)[^>]*>"#)
162+
_ = try re13a.firstMatch(in: tainted)
163+
// BAD - does not match mixed case script tags
164+
let re13b = try Regex(#"<(script|SCRIPT).*?>.*?<\/(script|SCRIPT)[^>]*>"#).dotMatchesNewlines(true)
165+
_ = try re13b.firstMatch(in: tainted)
166+
// BAD - does not match mixed case script tags
167+
let ns13 = try NSRegularExpression(pattern: #"<(script|SCRIPT).*?>.*?<\/(script|SCRIPT)[^>]*>"#, options: .dotMatchesLineSeparators)
168+
_ = ns13.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
169+
170+
// BAD - doesn't match newlines in the end tag [NOT DETECTED]
171+
let re14a = try Regex(#"(?i)<script[^>]*?>[\s\S]*?<\/script.*>"#)
172+
_ = try re14a.firstMatch(in: tainted)
173+
// BAD - doesn't match newlines in the end tag
174+
let re14b = try Regex(#"<script[^>]*?>[\s\S]*?<\/script.*>"#).ignoresCase(true)
175+
_ = try re14b.firstMatch(in: tainted)
176+
// BAD - doesn't match newlines in the end tag
177+
let ns14 = try NSRegularExpression(pattern: #"<script[^>]*?>[\s\S]*?<\/script.*>"#, options: .caseInsensitive)
178+
_ = ns14.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
179+
180+
// GOOD
181+
let re15a = try Regex(#"(?i)<script[^>]*?>[\s\S]*?<\/script[^>]*?>"#)
182+
_ = try re15a.firstMatch(in: tainted)
183+
// GOOD
184+
let re15b = try Regex(#"<script[^>]*?>[\s\S]*?<\/script[^>]*?>"#).ignoresCase(true)
185+
_ = try re15b.firstMatch(in: tainted)
186+
// GOOD
187+
let ns15 = try NSRegularExpression(pattern: #"<script[^>]*?>[\s\S]*?<\/script[^>]*?>"#, options: .caseInsensitive)
188+
_ = ns15.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
189+
190+
// BAD - doesn't match comments with the right capture groups
191+
let re16 = try Regex(#"<(?:!--([\S|\s]*?)-->)|([^\/\s>]+)[\S\s]*?>"#)
192+
_ = try re16.firstMatch(in: tainted)
193+
// BAD - doesn't match comments with the right capture groups
194+
let ns16 = try NSRegularExpression(pattern: #"<(?:!--([\S|\s]*?)-->)|([^\/\s>]+)[\S\s]*?>"#)
195+
_ = ns16.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
196+
197+
// BAD - capture groups
198+
let re17 = try Regex(#"<(?:(?:\/([^>]+)>)|(?:!--([\S|\s]*?)-->)|(?:([^\/\s>]+)((?:\s+[\w\-:.]+(?:\s*=\s*?(?:(?:"[^"]*")|(?:'[^']*')|[^\s"'\/>]+))?)*)[\S\s]*?(\/?)>))"#)
199+
_ = try re17.firstMatch(in: tainted)
200+
// BAD - capture groups
201+
let ns17 = try NSRegularExpression(pattern: #"<(?:(?:\/([^>]+)>)|(?:!--([\S|\s]*?)-->)|(?:([^\/\s>]+)((?:\s+[\w\-:.]+(?:\s*=\s*?(?:(?:"[^"]*")|(?:'[^']*')|[^\s"'\/>]+))?)*)[\S\s]*?(\/?)>))"#, options: .caseInsensitive)
202+
_ = ns17.firstMatch(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
203+
204+
// BAD - too strict matching on the end tag
205+
let ns2_1 = try NSRegularExpression(pattern: #"<script\b[^>]*>([\s\S]*?)<\/script>"#, options: .caseInsensitive)
206+
_ = ns2_1.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
207+
208+
// BAD - capture groups
209+
let ns2_2 = try NSRegularExpression(pattern: #"(<[a-z\/!$]("[^"]*"|'[^']*'|[^'">])*>|<!(--.*?--\s*)+>)"#, options: .caseInsensitive)
210+
_ = ns2_2.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
211+
212+
// BAD - capture groups
213+
let ns2_3 = try NSRegularExpression(pattern: #"<(?:(?:!--([\w\W]*?)-->)|(?:!\[CDATA\[([\w\W]*?)\]\]>)|(?:!DOCTYPE([\w\W]*?)>)|(?:\?([^\s\/<>]+) ?([\w\W]*?)[?/]>)|(?:\/([A-Za-z][A-Za-z0-9\-_\:\.]*)>)|(?:([A-Za-z][A-Za-z0-9\-_\:\.]*)((?:\s+[^"'>]+(?:(?:"[^"]*")|(?:'[^']*')|[^>]*))*|\/|\s+)>))"#)
214+
_ = ns2_3.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
215+
216+
// BAD - capture groups
217+
let ns2_4 = try NSRegularExpression(pattern: #"<!--([\w\W]*?)-->|<([^>]*?)>"#)
218+
_ = ns2_4.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
219+
220+
// GOOD - it's used with the ignorecase flag
221+
let ns2_5 = try NSRegularExpression(pattern: #"<script([^>]*)>([\S\s]*?)<\/script([^>]*)>"#, options: .caseInsensitive)
222+
_ = ns2_5.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
223+
224+
// BAD - doesn't match --!>
225+
let ns2_6 = try NSRegularExpression(pattern: #"-->"#)
226+
_ = ns2_6.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
227+
228+
// GOOD
229+
let ns2_7 = try NSRegularExpression(pattern: #"^>|^->|<!--|-->|--!>|<!-$"#)
230+
_ = ns2_7.matches(in: tainted, range: NSMakeRange(0, tainted.utf16.count))
231+
}

0 commit comments

Comments
 (0)