Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
144 changes: 144 additions & 0 deletions docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# CWE-184: Incomplete List of Disallowed Input

Avoid Incomplete 'deny lists' that can lead to security vulnerabilities such as cross-site scripting (XSS) by using 'allow lists' instead.

## Non-Compliant Code Example

The `noncompliant01.py` code demonstrates the difficult handling of exclusion lists in a multi language support use case. `UTF-8` has __1,112,064__ mappings between `8-32` bit values and printable characters such as `生` known as "code points".

The `noncompliant01.py` `filterString()` method attempts to search for disallowed inputs and fails to find the `script` tag due to the non-English character `生` in `<script生>`.

*[noncompliant01.py](noncompliant01.py):*

```python
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
"""Compliant Code Example"""

import re
import sys

if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")


def filter_string(input_string: str):
"""Normalize and validate untrusted string

Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating

# validate, exclude dangerous tags:
for tag in re.findall("<[^>]*>", input_string):
if tag in ["<script>", "<img", "<a href"]:
raise ValueError("Invalid input tag")


#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES dash-",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
]
for name in names:
print(name)
filter_string(name)

```

## Compliant Solution

The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe!

*[compliant01.py](compliant01.py):*

```python
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
"""Compliant Code Example"""

import re
import sys

if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")


def filter_string(input_string: str):
"""Normalize and validate untrusted string

Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating

# validate, only allow harmless tags
for tag in re.findall("<[^>]*>", input_string):
if tag not in ["<b>", "<p>", "</p>"]:
raise ValueError("Invalid input tag")
# TODO handle exception


#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES dash-",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
]
for name in names:
print(name)
filter_string(name)

```

The `compliant01.py` detects the unallowed character correctly and throws a `ValueError` exception. An actual production solution would also need to canonicalize and handle the exception correctly.

__Example compliant01.py output:__

```bash
/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py
$ python3 compliant01.py
YES 毛泽东先生
YES dash-
NOK <script﷯>
Traceback (most recent call last):
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 38, in <module>
filter_string(name)
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 23, in filter_string
raise ValueError("Invalid input tag")
ValueError: Invalid input tag

```

According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available.

## Automated Detection

|Tool|Version|Checker|Description|
|:---|:---|:---|:---|
|Bandit|1.7.4 on Python 3.10.4|Not Available||
|Flake8|8-4.0.1 on Python 3.10.4|Not Available||

## Related Guidelines

|||
|:---|:---|
|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-693: CWE-693: Protection Mechanism Failure (mitre.org)](https://cwe.mitre.org/data/definitions/693.html)|
|[MITRE CWE](http://cwe.mitre.org/)|Base : [CWE-184, Incomplete List of Disallowed Inputs (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/184.html)|
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[IDS11-J. Perform any string modifications before validation](https://wiki.sei.cmu.edu/confluence/display/java/IDS11-J.+Perform+any+string+modifications+before+validation)|

## Bibliography

|||
|:---|:---|
|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] |
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
60 changes: 18 additions & 42 deletions docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py
Original file line number Diff line number Diff line change
@@ -1,62 +1,38 @@
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
""" Compliant Code Example """
"""Compliant Code Example"""

import re
import unicodedata
import sys

sys.stdout.reconfigure(encoding="UTF-8")


class TagFilter:
"""Input validation for human language"""
if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")

def filter_string(self, input_string: str) -> str:
"""Normalize and validate untrusted string

Parameters:
input_string(string): String to validate
"""
# normalize
_str = unicodedata.normalize("NFKC", input_string)
def filter_string(input_string: str):
"""Normalize and validate untrusted string

# modify, keep only trusted human words
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str))
if len(_str) - len(_filtered_str) != 0:
raise ValueError("Invalid input string")
Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating

# validate, only allow harmless tags
for tag in re.findall("<[^>]*>", _str):
if tag not in ["<b>", "<p>", "</p>"]:
raise ValueError("Invalid input tag")
return _str
# validate, only allow harmless tags
for tag in re.findall("<[^>]*>", input_string):
if tag not in ["<b>", "<p>", "</p>"]:
raise ValueError("Invalid input tag")
# TODO handle exception


#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES María Quiñones Marqués",
"YES Борис Николаевич Ельцин",
"YES Björk Guðmundsdóttir",
"YES 0123456789",
"YES <b>",
"YES <p>foo</p>",
"YES underscore_",
"YES dash-",
"NOK semicolon;",
"NOK noprint " + "\uFDD0",
"NOK noprint " + "\uFDEF",
"NOK <script" + "\uFDEF" + ">",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
"NOK and &",
]
for name in names:
print(f"{name}", end=" ")
try:
TagFilter().filter_string(name)
except ValueError as e:
print(" Error: " + str(e))
else:
print(" OK")
print(name)
filter_string(name)
Original file line number Diff line number Diff line change
@@ -1,63 +1,37 @@
# SPDX-FileCopyrightText: OpenSSF project contributors
# SPDX-License-Identifier: MIT
""" Non-compliant Code Example """
"""Compliant Code Example"""

import re
import unicodedata
import sys

sys.stdout.reconfigure(encoding="UTF-8")


class TagFilter:
"""Input validation for human language"""
if sys.stdout.encoding.lower() != "utf-8":
sys.stdout.reconfigure(encoding="UTF-8")

def filter_string(self, input_string: str) -> str:
"""Normalize and validate untrusted string

Parameters:
input_string(string): String to validate
"""
# normalize
_str = unicodedata.normalize("NFKC", input_string)
def filter_string(input_string: str):
"""Normalize and validate untrusted string

# validate, exclude dangerous tags
for tag in re.findall("<[^>]*>", _str):
if tag in ["<script>", "<img", "<a href"]:
raise ValueError("Invalid input tag")
Parameters:
input_string(string): String to validate
"""
# TODO Canonicalize (normalize) before Validating

# modify, keep only trusted human words
# _filtered_str = "".join(re.findall(r'([\//\w<>\s_-]+)', _str))
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str))
if len(_str) - len(_filtered_str) != 0:
raise ValueError("Invalid input string")
return _filtered_str
# validate, exclude dangerous tags:
for tag in re.findall("<[^>]*>", input_string):
if tag in ["<script>", "<img", "<a href"]:
raise ValueError("Invalid input tag")


#####################
# attempting to exploit above code example
#####################
names = [
"YES 毛泽东先生",
"YES María Quiñones Marqués",
"YES Борис Николаевич Ельцин",
"YES Björk Guðmundsdóttir",
"YES 0123456789",
"YES <b>",
"YES <p>foo</p>",
"YES underscore_",
"YES dash-",
"NOK semicolon;",
"NOK noprint " + "\uFDD0",
"NOK noprint " + "\uFDEF",
"NOK <script" + "\uFDEF" + ">",
"NOK <script" + "\ufdef" + ">",
"NOK <script生>",
"NOK and &",
]
for name in names:
print(name, end=" ")
try:
TagFilter().filter_string(name)
except ValueError as e:
print(" Error: " + str(e))
else:
print(" OK")
print(name)
filter_string(name)
3 changes: 1 addition & 2 deletions docs/Secure-Coding-Guide-for-Python/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,7 @@ It is __not production code__ and requires code-style or python best practices t
|[CWE-617: Reachable Assertion](CWE-691/CWE-617/README.md)||

|[CWE-693: Protection Mechanism Failure](https://cwe.mitre.org/data/definitions/693.html)|Prominent CVE|
|:----------------------------------------------------------------|:----|
|[CWE-184: Incomplete List of Disallowed Input](CWE-693/CWE-184/.)||
|[CWE-184: Incomplete List of Disallowed Input](CWE-693/CWE-184/README.md)||
|[CWE-330: Use of Insufficiently Random Values](CWE-693/CWE-330/README.md)|[CVE-2020-7548](https://www.cvedetails.com/cve/CVE-2020-7548),<br/>CVSSv3.1: __9.8__,<br/>EPSS: __0.22__ (12.12.2024)|
|[CWE-798: Use of hardcoded credentials](CWE-693/CWE-798/.)||

Expand Down