diff --git a/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md b/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md index dd44fe7c..4d11e1a2 100644 --- a/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md +++ b/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md @@ -53,7 +53,7 @@ for name in names: ## Compliant Solution -The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe! +The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](../../CWE-707/CWE-180) must be added in order to make logging or displaying them safe! *[compliant01.py](compliant01.py):* @@ -118,7 +118,7 @@ ValueError: Invalid input tag ``` -According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available. +According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b](https://www.unicode.org/reports/tr36/), `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available. ## Automated Detection @@ -139,6 +139,6 @@ According to *Unicode Technical Report #36, Unicode Security Considerations [Dav ||| |:---|:---| -|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] | -|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | -|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | +|\[Unicode 2024\]|Unicode 16.0.0 \[online\]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) \[accessed 20 March 2025\] | +|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] | +|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] | diff --git a/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md b/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md new file mode 100644 index 00000000..9e62ed0f --- /dev/null +++ b/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md @@ -0,0 +1,181 @@ +# CWE-180: Incorrect Behavior Order: Validate Before Canonicalize + +Normalize/canonicalize strings before validating them to prevent risky strings such as `../../../../passwd` allowing directory traversal attacks, and to reduce `XSS` attacks. + +The need for supporting multiple languages requires the use of an extended list of characters encoding such as `UTF-8` supporting __1,112,064__ displayable characters. + +Character Encoding systems such as `ASCII`, `Windows-1252`, or `UTF-8` consist of an agreed mapping between byte values and a human-readable character known as code points. Each code point represents a single relation between characters such as a fixed number "`\u002e`", its graphical representation "`.`", and name "`FULL STOP`" [[Batchelder 2022]](https://www.youtube.com/watch?v=sgHbC6udIqc). Using the same encoding assures that equivalent strings have a unique binary representation Unicode Standard _annex #15, Unicode Normalization Forms_ [[Davis 2008]](https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-Davis08). Different or unexpected changes in encoding can allow attackers to workaround validation or input sanitation affords. + +> [!WARNING] +> Ensure to use allow lists to avoid having to maintain an deny list on a continuous basis (as exclusion lists are a moving target) as per [CWE-184: Incomplete List of Disallowed Input - Development Environment](../../CWE-693/CWE-184/README.md). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
NFKC normalizedUTF-16 (hex)
PrintHexNamePrintHexName
.\u002eFULL STOP\u2024ONE DOT LEADER
..\u002e\u002eFULL STOPFULL STOP\u2025TWO DOT LEADER
/\u003fSOLIDUS\uff0fFULLWIDTH SOLIDUS
+ +The `NFKC` and `NFKD`compatibility mode causes a `ONE DOT LEADER` to become a `FULL STOP` as demonstrated in `example01.py` [[python.org 2023]](https://docs.python.org/3/library/unicodedata.html) + +__[example01.py](example01.py):__ + +```py +""" Code Example """ + +# SPDX-FileCopyrightText: OpenSSF project contributors +# SPDX-License-Identifier: MIT +import unicodedata + +print("\N{FULL STOP}" * 10) +print("." == unicodedata.normalize("NFC", "\u2024") == "\N{FULL STOP}" == "\u002e") +print("." == unicodedata.normalize("NFD", "\u2024") == "\N{FULL STOP}" == "\u002e") +print("." == unicodedata.normalize("NFKC", "\u2024") == "\N{FULL STOP}" == "\u002e") +print("." == unicodedata.normalize("NFKD", "\u2024") == "\N{FULL STOP}" == "\u002e") +print("\N{FULL STOP}" * 10) +``` + +The first two lines in `example01.py` return `False` due to the missing compatibility mode and the last two lines return `True`. The issue depends on whether normalization is used, its mode, and when it is applied. + +Using a compatibility mode `NFKC` and `NFKD` can allow attackers to disguise malicious strings by using characters that are beyond the `ASCII` range of `0-127` turning a `ONE DOT LEADER` `\u2024` into a `FULL STOP \u002E`. + +Using non-compatibility `NFC` and `NFD` or stripping of characters can lead to a harmless string such as `` turn into `