Skip to content

Commit 5c77ede

Browse files
authored
Merge pull request github#12991 from Sim4n6/python-UBV
[Python] Add Unicode Bypass Validation query tests and help
2 parents 0ff90df + e300816 commit 5c77ede

File tree

8 files changed

+230
-0
lines changed

8 files changed

+230
-0
lines changed
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2+
<qhelp>
3+
<overview>
4+
<p>Security checks bypass due to a Unicode transformation</p>
5+
<p> If security checks or logical validation is performed before unicode normalization, the
6+
security checks or logical validation could be bypassed due to a potential Unicode
7+
character collision. The validation we consider are: any character escaping, any regex
8+
validation, or any string manipulation (such as <code>str.split</code>). </p>
9+
</overview>
10+
<recommendation>
11+
<p> Perform Unicode normalization before the logical validation. </p>
12+
</recommendation>
13+
<example>
14+
15+
<p> The following example showcases the bypass of all checks performed by <code>
16+
flask.escape()</code> due to a post-unicode normalization.</p>
17+
<p>For instance: the character U+FE64 (<code>﹤</code>) is not filtered-out by the flask
18+
escape function. But due to the Unicode normalization, the character is transformed and
19+
would become U+003C (<code> &lt; </code> ).</p>
20+
21+
<sample src="escape-bypass.py" />
22+
23+
</example>
24+
<references>
25+
<li> Research study: <a
26+
href="https://gosecure.github.io/presentations/2021-02-unicode-owasp-toronto/philippe_arteau_owasp_unicode_v4.pdf">
27+
Unicode vulnerabilities that could bYte you
28+
</a> and <a
29+
href="https://gosecure.github.io/unicode-pentester-cheatsheet/">Unicode pentest
30+
cheatsheet</a>. </li>
31+
</references>
32+
</qhelp>
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/**
2+
* @name Bypass Logical Validation Using Unicode Characters
3+
* @description A Unicode transformation is using a remote user-controlled data. The transformation is a Unicode normalization using the algorithms "NFC" or "NFKC". In all cases, the security measures implemented or the logical validation performed to escape any injection characters, to validate using regex patterns or to perform string-based checks, before the Unicode transformation are **bypassable** by special Unicode characters.
4+
* @kind path-problem
5+
* @id py/unicode-bypass-validation
6+
* @precision high
7+
* @problem.severity error
8+
* @tags security
9+
* experimental
10+
* external/cwe/cwe-176
11+
* external/cwe/cwe-179
12+
* external/cwe/cwe-180
13+
*/
14+
15+
import python
16+
import UnicodeBypassValidationQuery
17+
import DataFlow::PathGraph
18+
19+
from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
20+
where config.hasFlowPath(source, sink)
21+
select sink.getNode(), source, sink,
22+
"This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters.",
23+
sink.getNode(), "Unicode transformation (Unicode normalization)", source.getNode(),
24+
"remote user-controlled data"
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
/**
2+
* Provides default sources, sinks and sanitizers for detecting
3+
* "Unicode transformation"
4+
* vulnerabilities, as well as extension points for adding your own.
5+
*/
6+
7+
private import python
8+
private import semmle.python.dataflow.new.DataFlow
9+
10+
/**
11+
* Provides default sources, sinks and sanitizers for detecting
12+
* "Unicode transformation"
13+
* vulnerabilities, as well as extension points for adding your own.
14+
*/
15+
module UnicodeBypassValidation {
16+
/**
17+
* A data flow source for "Unicode transformation" vulnerabilities.
18+
*/
19+
abstract class Source extends DataFlow::Node { }
20+
21+
/**
22+
* A data flow sink for "Unicode transformation" vulnerabilities.
23+
*/
24+
abstract class Sink extends DataFlow::Node { }
25+
26+
/**
27+
* A sanitizer for "Unicode transformation" vulnerabilities.
28+
*/
29+
abstract class Sanitizer extends DataFlow::Node { }
30+
}
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
/**
2+
* Provides a taint-tracking configuration for detecting "Unicode transformation mishandling" vulnerabilities.
3+
*/
4+
5+
private import python
6+
import semmle.python.ApiGraphs
7+
import semmle.python.Concepts
8+
import semmle.python.dataflow.new.internal.DataFlowPublic
9+
import semmle.python.dataflow.new.TaintTracking
10+
import semmle.python.dataflow.new.internal.TaintTrackingPrivate
11+
import semmle.python.dataflow.new.RemoteFlowSources
12+
import UnicodeBypassValidationCustomizations::UnicodeBypassValidation
13+
14+
/** A state signifying that a logical validation has not been performed. */
15+
class PreValidation extends DataFlow::FlowState {
16+
PreValidation() { this = "PreValidation" }
17+
}
18+
19+
/** A state signifying that a logical validation has been performed. */
20+
class PostValidation extends DataFlow::FlowState {
21+
PostValidation() { this = "PostValidation" }
22+
}
23+
24+
/**
25+
* A taint-tracking configuration for detecting "Unicode transformation mishandling" vulnerabilities.
26+
*
27+
* This configuration uses two flow states, `PreValidation` and `PostValidation`,
28+
* to track the requirement that a logical validation has been performed before the Unicode Transformation.
29+
*/
30+
class Configuration extends TaintTracking::Configuration {
31+
Configuration() { this = "UnicodeBypassValidation" }
32+
33+
override predicate isSource(DataFlow::Node source, DataFlow::FlowState state) {
34+
source instanceof RemoteFlowSource and state instanceof PreValidation
35+
}
36+
37+
override predicate isAdditionalTaintStep(
38+
DataFlow::Node nodeFrom, DataFlow::FlowState stateFrom, DataFlow::Node nodeTo,
39+
DataFlow::FlowState stateTo
40+
) {
41+
(
42+
exists(Escaping escaping | nodeFrom = escaping.getAnInput() and nodeTo = escaping.getOutput())
43+
or
44+
exists(RegexExecution re | nodeFrom = re.getString() and nodeTo = re)
45+
or
46+
stringManipulation(nodeFrom, nodeTo) and
47+
not nodeTo.(DataFlow::MethodCallNode).getMethodName() in ["encode", "decode"]
48+
) and
49+
stateFrom instanceof PreValidation and
50+
stateTo instanceof PostValidation
51+
}
52+
53+
/* A Unicode Tranformation (Unicode tranformation) is considered a sink when the algorithm used is either NFC or NFKC. */
54+
override predicate isSink(DataFlow::Node sink, DataFlow::FlowState state) {
55+
exists(API::CallNode cn |
56+
cn = API::moduleImport("unicodedata").getMember("normalize").getACall() and
57+
sink = cn.getArg(1)
58+
or
59+
cn = API::moduleImport("unidecode").getMember("unidecode").getACall() and
60+
sink = cn.getArg(0)
61+
or
62+
cn = API::moduleImport("pyunormalize").getMember(["NFC", "NFD", "NFKC", "NFKD"]).getACall() and
63+
sink = cn.getArg(0)
64+
or
65+
cn = API::moduleImport("pyunormalize").getMember("normalize").getACall() and
66+
sink = cn.getArg(1)
67+
or
68+
cn = API::moduleImport("textnorm").getMember("normalize_unicode").getACall() and
69+
sink = cn.getArg(0)
70+
) and
71+
state instanceof PostValidation
72+
}
73+
}
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import unicodedata
2+
from flask import Flask, request, escape, render_template
3+
4+
app = Flask(__name__)
5+
6+
7+
@app.route("/unsafe1")
8+
def unsafe1():
9+
user_input = escape(request.args.get("ui"))
10+
normalized_user_input = unicodedata.normalize("NFKC", user_input)
11+
return render_template("result.html", normalized_user_input=normalized_user_input)
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
edges
2+
| samples.py:2:26:2:32 | ControlFlowNode for ImportMember | samples.py:2:26:2:32 | GSSA Variable request |
3+
| samples.py:2:26:2:32 | GSSA Variable request | samples.py:9:25:9:31 | ControlFlowNode for request |
4+
| samples.py:2:26:2:32 | GSSA Variable request | samples.py:16:25:16:31 | ControlFlowNode for request |
5+
| samples.py:9:18:9:47 | ControlFlowNode for escape() | samples.py:10:59:10:68 | ControlFlowNode for user_input |
6+
| samples.py:9:25:9:31 | ControlFlowNode for request | samples.py:9:25:9:36 | ControlFlowNode for Attribute |
7+
| samples.py:9:25:9:36 | ControlFlowNode for Attribute | samples.py:9:25:9:46 | ControlFlowNode for Attribute() |
8+
| samples.py:9:25:9:46 | ControlFlowNode for Attribute() | samples.py:9:18:9:47 | ControlFlowNode for escape() |
9+
| samples.py:16:18:16:47 | ControlFlowNode for escape() | samples.py:20:62:20:71 | ControlFlowNode for user_input |
10+
| samples.py:16:25:16:31 | ControlFlowNode for request | samples.py:16:25:16:36 | ControlFlowNode for Attribute |
11+
| samples.py:16:25:16:36 | ControlFlowNode for Attribute | samples.py:16:25:16:46 | ControlFlowNode for Attribute() |
12+
| samples.py:16:25:16:46 | ControlFlowNode for Attribute() | samples.py:16:18:16:47 | ControlFlowNode for escape() |
13+
nodes
14+
| samples.py:2:26:2:32 | ControlFlowNode for ImportMember | semmle.label | ControlFlowNode for ImportMember |
15+
| samples.py:2:26:2:32 | GSSA Variable request | semmle.label | GSSA Variable request |
16+
| samples.py:9:18:9:47 | ControlFlowNode for escape() | semmle.label | ControlFlowNode for escape() |
17+
| samples.py:9:25:9:31 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
18+
| samples.py:9:25:9:36 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
19+
| samples.py:9:25:9:46 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
20+
| samples.py:10:59:10:68 | ControlFlowNode for user_input | semmle.label | ControlFlowNode for user_input |
21+
| samples.py:16:18:16:47 | ControlFlowNode for escape() | semmle.label | ControlFlowNode for escape() |
22+
| samples.py:16:25:16:31 | ControlFlowNode for request | semmle.label | ControlFlowNode for request |
23+
| samples.py:16:25:16:36 | ControlFlowNode for Attribute | semmle.label | ControlFlowNode for Attribute |
24+
| samples.py:16:25:16:46 | ControlFlowNode for Attribute() | semmle.label | ControlFlowNode for Attribute() |
25+
| samples.py:20:62:20:71 | ControlFlowNode for user_input | semmle.label | ControlFlowNode for user_input |
26+
subpaths
27+
#select
28+
| samples.py:10:59:10:68 | ControlFlowNode for user_input | samples.py:2:26:2:32 | ControlFlowNode for ImportMember | samples.py:10:59:10:68 | ControlFlowNode for user_input | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | samples.py:10:59:10:68 | ControlFlowNode for user_input | Unicode transformation (Unicode normalization) | samples.py:2:26:2:32 | ControlFlowNode for ImportMember | remote user-controlled data |
29+
| samples.py:20:62:20:71 | ControlFlowNode for user_input | samples.py:2:26:2:32 | ControlFlowNode for ImportMember | samples.py:20:62:20:71 | ControlFlowNode for user_input | This $@ processes unsafely $@ and any logical validation in-between could be bypassed using special Unicode characters. | samples.py:20:62:20:71 | ControlFlowNode for user_input | Unicode transformation (Unicode normalization) | samples.py:2:26:2:32 | ControlFlowNode for ImportMember | remote user-controlled data |
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
experimental/Security/CWE-176/UnicodeBypassValidation.ql
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
import unicodedata
2+
from flask import Flask, request, escape, render_template
3+
4+
app = Flask(__name__)
5+
6+
7+
@app.route("/unsafe1")
8+
def unsafe1():
9+
user_input = escape(request.args.get("ui"))
10+
normalized_user_input = unicodedata.normalize("NFKC", user_input) # $result=BAD
11+
return render_template("result.html", normalized_user_input=normalized_user_input)
12+
13+
14+
@app.route("/unsafe2")
15+
def unsafe1bis():
16+
user_input = escape(request.args.get("ui"))
17+
if user_input.isascii():
18+
normalized_user_input = user_input
19+
else:
20+
normalized_user_input = unicodedata.normalize("NFC", user_input) # $result=BAD
21+
return render_template("result.html", normalized_user_input=normalized_user_input)
22+
23+
24+
@app.route("/safe1")
25+
def safe1():
26+
normalized_user_input = unicodedata.normalize(
27+
"NFKC", request.args.get("ui")
28+
) # $result=OK
29+
user_input = escape(normalized_user_input)
30+
return render_template("result.html", normalized_user_input=user_input)

0 commit comments

Comments
 (0)