Skip to content

Commit dbf4053

Browse files
myterons19110
andauthored
pySCG 707 180 2 GitHub (#846)
* pySCG adding CWE-180 as part of #531 --------- Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: myteron <[email protected]> Co-authored-by: Hubert Daniszewski <[email protected]> Co-authored-by: andrew-costello
1 parent 2238280 commit dbf4053

File tree

7 files changed

+200
-17
lines changed

7 files changed

+200
-17
lines changed

docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ for name in names:
5353

5454
## Compliant Solution
5555

56-
The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe!
56+
The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](../../CWE-707/CWE-180) must be added in order to make logging or displaying them safe!
5757

5858
*[compliant01.py](compliant01.py):*
5959

@@ -118,7 +118,7 @@ ValueError: Invalid input tag
118118

119119
```
120120

121-
According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available.
121+
According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b](https://www.unicode.org/reports/tr36/), `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available.
122122

123123
## Automated Detection
124124

@@ -139,6 +139,6 @@ According to *Unicode Technical Report #36, Unicode Security Considerations [Dav
139139

140140
|||
141141
|:---|:---|
142-
|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] |
143-
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
144-
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
142+
|\[Unicode 2024\]|Unicode 16.0.0 \[online\]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) \[accessed 20 March 2025\] |
143+
|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] |
144+
|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] |
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# CWE-180: Incorrect Behavior Order: Validate Before Canonicalize
2+
3+
Normalize/canonicalize strings before validating them to prevent risky strings such as `../../../../passwd` allowing directory traversal attacks, and to reduce `XSS` attacks.
4+
5+
The need for supporting multiple languages requires the use of an extended list of characters encoding such as `UTF-8` supporting __1,112,064__ displayable characters.
6+
7+
Character Encoding systems such as `ASCII`, `Windows-1252`, or `UTF-8` consist of an agreed mapping between byte values and a human-readable character known as code points. Each code point represents a single relation between characters such as a fixed number "`\u002e`", its graphical representation "`.`", and name "`FULL STOP`" [[Batchelder 2022]](https://www.youtube.com/watch?v=sgHbC6udIqc). Using the same encoding assures that equivalent strings have a unique binary representation Unicode Standard _annex #15, Unicode Normalization Forms_ [[Davis 2008]](https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-Davis08). Different or unexpected changes in encoding can allow attackers to workaround validation or input sanitation affords.
8+
9+
> [!WARNING]
10+
> Ensure to use allow lists to avoid having to maintain an deny list on a continuous basis (as exclusion lists are a moving target) as per [CWE-184: Incomplete List of Disallowed Input - Development Environment](../../CWE-693/CWE-184/README.md).
11+
12+
<table>
13+
<tr>
14+
<th colspan="3">NFKC normalized</th>
15+
<th colspan="3">UTF-16 (hex)</th>
16+
</tr>
17+
<tr>
18+
<th>Print</th>
19+
<th>Hex</th>
20+
<th>Name</th>
21+
<th>Print</th>
22+
<th>Hex</th>
23+
<th>Name</th>
24+
</tr>
25+
<tr>
26+
<td >.</td>
27+
<td>\u002e</td>
28+
<td>FULL STOP</td>
29+
<td>․</td>
30+
<td>\u2024</td>
31+
<td>ONE DOT LEADER</td>
32+
</tr>
33+
<tr>
34+
<td >..</td>
35+
<td>\u002e\u002e</td>
36+
<td>FULL STOPFULL STOP</td>
37+
<td>‥</td>
38+
<td>\u2025</td>
39+
<td>TWO DOT LEADER</td>
40+
</tr>
41+
<tr>
42+
<td >/</td>
43+
<td>\u003f</td>
44+
<td>SOLIDUS</td>
45+
<td>/</td>
46+
<td>\uff0f</td>
47+
<td>FULLWIDTH SOLIDUS</td>
48+
</tr>
49+
</table>
50+
51+
The `NFKC` and `NFKD`compatibility mode causes a `ONE DOT LEADER` to become a `FULL STOP` as demonstrated in `example01.py` [[python.org 2023]](https://docs.python.org/3/library/unicodedata.html)
52+
53+
__[example01.py](example01.py):__
54+
55+
```py
56+
""" Code Example """
57+
58+
# SPDX-FileCopyrightText: OpenSSF project contributors
59+
# SPDX-License-Identifier: MIT
60+
import unicodedata
61+
62+
print("\N{FULL STOP}" * 10)
63+
print("." == unicodedata.normalize("NFC", "\u2024") == "\N{FULL STOP}" == "\u002e")
64+
print("." == unicodedata.normalize("NFD", "\u2024") == "\N{FULL STOP}" == "\u002e")
65+
print("." == unicodedata.normalize("NFKC", "\u2024") == "\N{FULL STOP}" == "\u002e")
66+
print("." == unicodedata.normalize("NFKD", "\u2024") == "\N{FULL STOP}" == "\u002e")
67+
print("\N{FULL STOP}" * 10)
68+
```
69+
70+
The first two lines in `example01.py` return `False` due to the missing compatibility mode and the last two lines return `True`. The issue depends on whether normalization is used, its mode, and when it is applied.
71+
72+
Using a compatibility mode `NFKC` and `NFKD` can allow attackers to disguise malicious strings by using characters that are beyond the `ASCII` range of `0-127` turning a `ONE DOT LEADER` `\u2024` into a `FULL STOP \u002E`.
73+
74+
Using non-compatibility `NFC` and `NFD` or stripping of characters can lead to a harmless string such as `<script生>` turn into `<script>` as per _CWE-182: Collapse of Data into Unsafe Value (4.16)_ [[MITRE CWE-182 2024]](https://cwe.mitre.org/data/definitions/182.html)
75+
76+
## Non-Compliant Code Example - Compatibility mode
77+
78+
Reducing the list of allowed characters or switching between different encodings can be required by design in order to stay compatible between different systems.
79+
80+
The `noncompliant01.py` code is attempting to detect a directory traversal attack but only normalizes for logging `unicodedata.normalize()`
81+
82+
__[noncompliant01.py](noncompliant01.py):__
83+
84+
```python
85+
# SPDX-FileCopyrightText: OpenSSF project contributors
86+
# SPDX-License-Identifier: MIT
87+
"""Non-compliant Code Example"""
88+
89+
import re
90+
import unicodedata
91+
92+
93+
def api_with_ids(suspicious_string: str):
94+
"""Fancy intrusion detection system(IDS)"""
95+
if re.search("./", suspicious_string):
96+
normalized_string = unicodedata.normalize("NFKC", suspicious_string)
97+
print(f"detected an attack sequence {normalized_string}")
98+
else:
99+
print("Nothing suspicious")
100+
101+
102+
#####################
103+
# attempting to exploit above code example
104+
#####################
105+
# The MALICIOUS_INPUT is using:
106+
# \u2024 or "ONE DOT LEADER"
107+
# \uFF0F or 'FULLWIDTH SOLIDUS'
108+
api_with_ids("\u2024\u2024\uff0f" * 10 + "passwd")
109+
```
110+
111+
The `re.search("./"` can not detect the "`ONE DOT LEADER`" or "`FULLWIDTH SOLIDUS`" because it is not normalized at the right time, which allows a directory traversal attack.
112+
113+
## Compliant Solution - Compatibility mode
114+
115+
This compliant solution normalizes the string before testing it and according to _annex #15_ [[Davis 2008]](https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-Davis08), and [[Batchelder 2022]](https://www.youtube.com/watch?v=sgHbC6udIqc) we want to ensure that strings have a unique binary representation within our code.
116+
117+
__[compliant01.py](compliant01.py):__
118+
119+
```python
120+
# SPDX-FileCopyrightText: OpenSSF project contributors
121+
# SPDX-License-Identifier: MIT
122+
"""Compliant Code Example"""
123+
124+
import re
125+
import unicodedata
126+
127+
128+
def api_with_ids(suspicious_string: str):
129+
"""Fancy intrusion detection system(IDS)"""
130+
normalized_string = unicodedata.normalize("NFKC", suspicious_string)
131+
if re.search("./", normalized_string):
132+
print("detected an attack sequence with . or /")
133+
else:
134+
print("Nothing suspicious")
135+
136+
137+
#####################
138+
# attempting to exploit above code example
139+
#####################
140+
# The MALICIOUS_INPUT is using:
141+
# \u2024 or "ONE DOT LEADER"
142+
# \uFF0F or 'FULLWIDTH SOLIDUS'
143+
api_with_ids("\u2024\u2024\uff0f" * 10 + "passwd")
144+
145+
```
146+
147+
Developers should be aware of the encoding of data printed to `HTML`. For example, the following string was an `XSS` vulnerability in chrome `숍訊昱穿刷奄剔㏆穽侘㈊섞昌侄從쒜` [[issues.chromium.org 2025]](https://issues.chromium.org/issues/40076480); if the charset of the webpage was set to `ISO-2022-KR` or another unknown charset.
148+
At the time the Korean language was unsupported so it attempted to fall back to Windows OS default encoding `Windows-1252` and executed the code [[Taylor 2009]](https://zaynar.co.uk/posts/charset-encoding-xss/).
149+
150+
Note that some operating systems (Windows, Mac) have system encodings for various characters which do get executed on a webpage regardless of charset. These should be avoided as they can cause issues with devices that don't support that charset. Other character sets should be avoided too, such as ascii, because mobile phones or old SMS generally has a very limited charset and behave unexpectedly.
151+
152+
## Automated Detection
153+
154+
None known
155+
156+
|Tool|Version|Checker|Description|
157+
|:---|:---|:---|:---|
158+
|||||
159+
160+
## Related Guidelines
161+
162+
|||
163+
|:---|:---|
164+
|[ISO/IEC TR 24772:2013](https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-ISO/IECTR24772-2013)|Cross-site Scripting \[XYT\] \[online\], available from: <https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-ISO/IECTR24772-2013>, \[Accessed April 2025\]|
165+
|[MITRE CWE](http://cwe.mitre.org/)|Pillar CWE - CWE-707: Improper Neutralization \[online\], available from:<https://cwe.mitre.org/data/definitions/707.html> \[Accessed April 2025\]|
166+
|[MITRE CWE](http://cwe.mitre.org/)|Variant: CWE-180, Incorrect behavior order: Validate before canonicalize \[online\], available from: <http://cwe.mitre.org/data/definitions/180.html>|
167+
|[MITRE CWE](http://cwe.mitre.org/)|Base: CWE-182: Collapse of Data into Unsafe Value (4.16) \[online\], available from: <http://cwe.mitre.org/data/definitions/182.html>|
168+
|[MITRE CWE](http://cwe.mitre.org/)|Base: CWE-184: Incomplete List of Disallowed Input - Development Environment. \[online\], available from: <http://cwe.mitre.org/data/definitions/184.html>|
169+
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|IDS01-J. Normalize strings before validating them \[online\], available from: <https://wiki.sei.cmu.edu/confluence/display/java/IDS01-J.+Normalize+strings+before+validating+them>|
170+
171+
## Bibliography
172+
173+
|||
174+
|:---|:---|
175+
|\[Davis 2008\]|Mark Davis and Ken Whistler, Unicode Standard Annex #15, Unicode Normalization Forms, 2008. \[online\], available from: <http://unicode.org/reports/tr15/> \[Accessed April 2025\]<br>Mark Davis and Michel Suignard, Unicode Technical Report #36, Unicode Security Considerations, 2008.\[online\], Available from:<http://www.unicode.org/reports/tr36/> \[Accessed 4 April 2025\] |
176+
|\[Weber 2009\]|MUnraveling Unicode: A Bag of Tricks for Bug Hunting \[online\], available from: <http://www.lookout.net/wp-content/uploads/2009/03/chris_weber_exploiting-unicode-enabled-software-v15.pdf> \[Accessed April 2025\]|
177+
|\[Ned Batchelder 2022\]|Pragmatic Unicode, or, How do I stop the pain? - YouTube \[online\], available from: <https://www.youtube.com/watch?v=sgHbC6udIqc> \[Accessed April 2025\]|
178+
|\[Kuchling 2022\]|Unicode HOWTO \[online\], available from: <https://docs.python.org/3/howto/unicode.html> \[Accessed April 2025\]|
179+
|\[python.org 2023\]|unicodedata — Unicode Database — Python 3.12.0 documentation \[online\], available from: <https://docs.python.org/3/library/unicodedata.html> \[Accessed April 2025\]|
180+
|\[issues.chromium.org 2025\]|XSS issue due to the lack of support for ISO-2022-KR \[online\], available from: <https://issues.chromium.org/issues/40076480> \[Accessed April 2025\]|
181+
|\[Taylor 2009\]|XSS vulnerabilities with unusual character encodings \[online\], available from: <https://zaynar.co.uk/posts/charset-encoding-xss/> \[Accessed April 2025\]|

docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/compliant01.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
3-
""" Compliant Code Example """
3+
"""Compliant Code Example"""
4+
45
import re
56
import unicodedata
67

78

8-
def api_with_ids(suspicious_string):
9+
def api_with_ids(suspicious_string: str):
910
"""Fancy intrusion detection system(IDS)"""
1011
normalized_string = unicodedata.normalize("NFKC", suspicious_string)
1112
if re.search("./", normalized_string):
@@ -20,4 +21,4 @@ def api_with_ids(suspicious_string):
2021
# The MALICIOUS_INPUT is using:
2122
# \u2024 or "ONE DOT LEADER"
2223
# \uFF0F or 'FULLWIDTH SOLIDUS'
23-
api_with_ids("\u2024\u2024\uFF0F" * 10 + "passwd")
24+
api_with_ids("\u2024\u2024\uff0f" * 10 + "passwd")

docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/example01.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
import unicodedata
44

55
print("\N{FULL STOP}" * 10)
6-
print("." == unicodedata.normalize("NFC", "\u2024") == "\N{FULL STOP}" == "\u002E")
7-
print("." == unicodedata.normalize("NFD", "\u2024") == "\N{FULL STOP}" == "\u002E")
8-
print("." == unicodedata.normalize("NFKC", "\u2024") == "\N{FULL STOP}" == "\u002E")
9-
print("." == unicodedata.normalize("NFKD", "\u2024") == "\N{FULL STOP}" == "\u002E")
6+
print("." == unicodedata.normalize("NFC", "\u2024") == "\N{FULL STOP}" == "\u002e")
7+
print("." == unicodedata.normalize("NFD", "\u2024") == "\N{FULL STOP}" == "\u002e")
8+
print("." == unicodedata.normalize("NFKC", "\u2024") == "\N{FULL STOP}" == "\u002e")
9+
print("." == unicodedata.normalize("NFKD", "\u2024") == "\N{FULL STOP}" == "\u002e")
1010
print("\N{FULL STOP}" * 10)

docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/noncompliant01.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,12 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
3-
""" Non-compliant Code Example """
3+
"""Non-compliant Code Example"""
4+
45
import re
56
import unicodedata
67

78

8-
def api_with_ids(suspicious_string):
9+
def api_with_ids(suspicious_string: str):
910
"""Fancy intrusion detection system(IDS)"""
1011
if re.search("./", suspicious_string):
1112
normalized_string = unicodedata.normalize("NFKC", suspicious_string)
@@ -20,4 +21,4 @@ def api_with_ids(suspicious_string):
2021
# The MALICIOUS_INPUT is using:
2122
# \u2024 or "ONE DOT LEADER"
2223
# \uFF0F or 'FULLWIDTH SOLIDUS'
23-
api_with_ids("\u2024\u2024\uFF0F" * 10 + "passwd")
24+
api_with_ids("\u2024\u2024\uff0f" * 10 + "passwd")

docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-78/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ The `FileOperations().list_dir()` method allows an attacker to add commands via
7575

7676
The attack surface increases if a user is also allowed to upload or create files or folders.
7777

78-
The `noncompliant02.py` example demonstrates the injection via file or folder name that is created prior to using the `list_dir()` method. We assume here that an untrusted user is allowed to create files or folders named `& calc.exe or ;ps aux` as part of another service such as upload area, submit form, or as a result of a zip-bomb as per *CWE-409: Improper Handling of Highly Compressed Data (Data Amplification)*. Encoding issues as described in *CWE-180: Incorrect Behavior Order: Validate Before Canonicalize* must also be considered.
78+
The `noncompliant02.py` example demonstrates the injection via file or folder name that is created prior to using the `list_dir()` method. We assume here that an untrusted user is allowed to create files or folders named `& calc.exe or ;ps aux` as part of another service such as upload area, submit form, or as a result of a zip-bomb as per *CWE-409: Improper Handling of Highly Compressed Data (Data Amplification)*. Encoding issues as described in *[CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](../CWE-180/README.md)* must also be considered.
7979

8080
The issue occurs when mixing shell commands with data from a lesser trusted source.
8181

docs/Secure-Coding-Guide-for-Python/readme.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ It is __not production code__ and requires code-style or python best practices t
9999
|[CWE-89: Improper Neutralization of Special Elements used in an SQL Command ('SQL Injection')](CWE-707/CWE-89/README.md)|[CVE-2019-8600](https://www.cvedetails.com/cve/CVE-2019-8600/),<br/>CVSSv3.1: __9.8__,<br/>EPSS: __01.43__ (18.02.2024)|
100100
|[CWE-117: Improper Output Neutralization for Logs](CWE-707/CWE-117/.)||
101101
|[CWE-175: Improper Handling of Mixed Encoding](CWE-707/CWE-175/README.md)||
102-
|[CWE-180: Incorrect behavior order: Validate before Canonicalize](CWE-707/CWE-180/.)||
102+
|[CWE-180: Incorrect behavior order: Validate before Canonicalize](CWE-707/CWE-180/README.md)||
103103

104104
|[CWE-710: Improper Adherence to Coding Standards](https://cwe.mitre.org/data/definitions/710.html)|Prominent CVE|
105105
|:----------------------------------------------------------------|:----|

0 commit comments

Comments
 (0)