generated from ossf/project-template
-
Notifications
You must be signed in to change notification settings - Fork 184
pySCG: adding documentation to CWE-184 as part of #531 #820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
144 changes: 144 additions & 0 deletions
144
docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
# CWE-184: Incomplete List of Disallowed Input | ||
|
||
Avoid Incomplete 'deny lists' that can lead to security vulnerabilities such as cross-site scripting (XSS) by using 'allow lists' instead. | ||
|
||
## Non-Compliant Code Example | ||
|
||
The `noncompliant01.py` code demonstrates the difficult handling of exclusion lists in a multi language support use case. `UTF-8` has __1,112,064__ mappings between `8-32` bit values and printable characters such as `生` known as "code points". | ||
|
||
The `noncompliant01.py` `filterString()` method attempts to search for disallowed inputs and fails to find the `script` tag due to the non-English character `生` in `<script生>`. | ||
|
||
*[noncompliant01.py](noncompliant01.py):* | ||
|
||
```python | ||
# SPDX-FileCopyrightText: OpenSSF project contributors | ||
# SPDX-License-Identifier: MIT | ||
"""Compliant Code Example""" | ||
|
||
import re | ||
import sys | ||
|
||
if sys.stdout.encoding.lower() != "utf-8": | ||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
|
||
def filter_string(input_string: str): | ||
"""Normalize and validate untrusted string | ||
|
||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# TODO Canonicalize (normalize) before Validating | ||
|
||
# validate, exclude dangerous tags: | ||
for tag in re.findall("<[^>]*>", input_string): | ||
if tag in ["<script>", "<img", "<a href"]: | ||
raise ValueError("Invalid input tag") | ||
|
||
|
||
##################### | ||
# attempting to exploit above code example | ||
##################### | ||
names = [ | ||
"YES 毛泽东先生", | ||
"YES dash-", | ||
"NOK <script" + "\ufdef" + ">", | ||
"NOK <script生>", | ||
] | ||
for name in names: | ||
print(name) | ||
filter_string(name) | ||
|
||
``` | ||
|
||
## Compliant Solution | ||
|
||
The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe! | ||
|
||
*[compliant01.py](compliant01.py):* | ||
|
||
```python | ||
# SPDX-FileCopyrightText: OpenSSF project contributors | ||
# SPDX-License-Identifier: MIT | ||
"""Compliant Code Example""" | ||
|
||
import re | ||
import sys | ||
|
||
if sys.stdout.encoding.lower() != "utf-8": | ||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
|
||
def filter_string(input_string: str): | ||
"""Normalize and validate untrusted string | ||
|
||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# TODO Canonicalize (normalize) before Validating | ||
|
||
# validate, only allow harmless tags | ||
for tag in re.findall("<[^>]*>", input_string): | ||
if tag not in ["<b>", "<p>", "</p>"]: | ||
raise ValueError("Invalid input tag") | ||
# TODO handle exception | ||
|
||
|
||
##################### | ||
# attempting to exploit above code example | ||
##################### | ||
names = [ | ||
"YES 毛泽东先生", | ||
"YES dash-", | ||
"NOK <script" + "\ufdef" + ">", | ||
"NOK <script生>", | ||
] | ||
for name in names: | ||
print(name) | ||
filter_string(name) | ||
|
||
``` | ||
|
||
The `compliant01.py` detects the unallowed character correctly and throws a `ValueError` exception. An actual production solution would also need to canonicalize and handle the exception correctly. | ||
|
||
__Example compliant01.py output:__ | ||
|
||
```bash | ||
/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py | ||
$ python3 compliant01.py | ||
YES 毛泽东先生 | ||
YES dash- | ||
NOK <script> | ||
Traceback (most recent call last): | ||
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 38, in <module> | ||
filter_string(name) | ||
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 23, in filter_string | ||
raise ValueError("Invalid input tag") | ||
ValueError: Invalid input tag | ||
|
||
myteron marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available. | ||
|
||
## Automated Detection | ||
|
||
|Tool|Version|Checker|Description| | ||
|:---|:---|:---|:---| | ||
|Bandit|1.7.4 on Python 3.10.4|Not Available|| | ||
|Flake8|8-4.0.1 on Python 3.10.4|Not Available|| | ||
|
||
## Related Guidelines | ||
|
||
||| | ||
|:---|:---| | ||
|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-693: CWE-693: Protection Mechanism Failure (mitre.org)](https://cwe.mitre.org/data/definitions/693.html)| | ||
|[MITRE CWE](http://cwe.mitre.org/)|Base : [CWE-184, Incomplete List of Disallowed Inputs (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/184.html)| | ||
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[IDS11-J. Perform any string modifications before validation](https://wiki.sei.cmu.edu/confluence/display/java/IDS11-J.+Perform+any+string+modifications+before+validation)| | ||
|
||
## Bibliography | ||
|
||
||| | ||
|:---|:---| | ||
|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] | | ||
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | | ||
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | |
60 changes: 18 additions & 42 deletions
60
docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,62 +1,38 @@ | ||
# SPDX-FileCopyrightText: OpenSSF project contributors | ||
# SPDX-License-Identifier: MIT | ||
""" Compliant Code Example """ | ||
"""Compliant Code Example""" | ||
|
||
import re | ||
import unicodedata | ||
import sys | ||
|
||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
|
||
class TagFilter: | ||
"""Input validation for human language""" | ||
if sys.stdout.encoding.lower() != "utf-8": | ||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
def filter_string(self, input_string: str) -> str: | ||
"""Normalize and validate untrusted string | ||
|
||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# normalize | ||
_str = unicodedata.normalize("NFKC", input_string) | ||
def filter_string(input_string: str): | ||
"""Normalize and validate untrusted string | ||
|
||
# modify, keep only trusted human words | ||
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str)) | ||
if len(_str) - len(_filtered_str) != 0: | ||
raise ValueError("Invalid input string") | ||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# TODO Canonicalize (normalize) before Validating | ||
|
||
# validate, only allow harmless tags | ||
for tag in re.findall("<[^>]*>", _str): | ||
if tag not in ["<b>", "<p>", "</p>"]: | ||
raise ValueError("Invalid input tag") | ||
return _str | ||
# validate, only allow harmless tags | ||
for tag in re.findall("<[^>]*>", input_string): | ||
if tag not in ["<b>", "<p>", "</p>"]: | ||
raise ValueError("Invalid input tag") | ||
# TODO handle exception | ||
|
||
|
||
##################### | ||
# attempting to exploit above code example | ||
##################### | ||
names = [ | ||
"YES 毛泽东先生", | ||
"YES María Quiñones Marqués", | ||
"YES Борис Николаевич Ельцин", | ||
"YES Björk Guðmundsdóttir", | ||
"YES 0123456789", | ||
"YES <b>", | ||
"YES <p>foo</p>", | ||
"YES underscore_", | ||
"YES dash-", | ||
"NOK semicolon;", | ||
"NOK noprint " + "\uFDD0", | ||
"NOK noprint " + "\uFDEF", | ||
"NOK <script" + "\uFDEF" + ">", | ||
"NOK <script" + "\ufdef" + ">", | ||
"NOK <script生>", | ||
"NOK and &", | ||
] | ||
for name in names: | ||
print(f"{name}", end=" ") | ||
try: | ||
TagFilter().filter_string(name) | ||
except ValueError as e: | ||
print(" Error: " + str(e)) | ||
else: | ||
print(" OK") | ||
print(name) | ||
filter_string(name) |
60 changes: 17 additions & 43 deletions
60
docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/noncompliant01.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,63 +1,37 @@ | ||
# SPDX-FileCopyrightText: OpenSSF project contributors | ||
# SPDX-License-Identifier: MIT | ||
""" Non-compliant Code Example """ | ||
"""Compliant Code Example""" | ||
|
||
import re | ||
import unicodedata | ||
import sys | ||
|
||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
|
||
class TagFilter: | ||
"""Input validation for human language""" | ||
if sys.stdout.encoding.lower() != "utf-8": | ||
sys.stdout.reconfigure(encoding="UTF-8") | ||
|
||
def filter_string(self, input_string: str) -> str: | ||
"""Normalize and validate untrusted string | ||
|
||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# normalize | ||
_str = unicodedata.normalize("NFKC", input_string) | ||
def filter_string(input_string: str): | ||
"""Normalize and validate untrusted string | ||
|
||
# validate, exclude dangerous tags | ||
for tag in re.findall("<[^>]*>", _str): | ||
if tag in ["<script>", "<img", "<a href"]: | ||
raise ValueError("Invalid input tag") | ||
Parameters: | ||
input_string(string): String to validate | ||
""" | ||
# TODO Canonicalize (normalize) before Validating | ||
|
||
# modify, keep only trusted human words | ||
# _filtered_str = "".join(re.findall(r'([\//\w<>\s_-]+)', _str)) | ||
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str)) | ||
if len(_str) - len(_filtered_str) != 0: | ||
raise ValueError("Invalid input string") | ||
return _filtered_str | ||
# validate, exclude dangerous tags: | ||
for tag in re.findall("<[^>]*>", input_string): | ||
if tag in ["<script>", "<img", "<a href"]: | ||
raise ValueError("Invalid input tag") | ||
|
||
|
||
##################### | ||
# attempting to exploit above code example | ||
##################### | ||
names = [ | ||
"YES 毛泽东先生", | ||
"YES María Quiñones Marqués", | ||
"YES Борис Николаевич Ельцин", | ||
"YES Björk Guðmundsdóttir", | ||
"YES 0123456789", | ||
"YES <b>", | ||
"YES <p>foo</p>", | ||
"YES underscore_", | ||
"YES dash-", | ||
"NOK semicolon;", | ||
"NOK noprint " + "\uFDD0", | ||
"NOK noprint " + "\uFDEF", | ||
"NOK <script" + "\uFDEF" + ">", | ||
"NOK <script" + "\ufdef" + ">", | ||
"NOK <script生>", | ||
"NOK and &", | ||
] | ||
for name in names: | ||
print(name, end=" ") | ||
try: | ||
TagFilter().filter_string(name) | ||
except ValueError as e: | ||
print(" Error: " + str(e)) | ||
else: | ||
print(" OK") | ||
print(name) | ||
filter_string(name) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.