diff --git a/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md b/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md index dd44fe7c..4d11e1a2 100644 --- a/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md +++ b/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/README.md @@ -53,7 +53,7 @@ for name in names: ## Compliant Solution -The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe! +The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](../../CWE-707/CWE-180) must be added in order to make logging or displaying them safe! *[compliant01.py](compliant01.py):* @@ -118,7 +118,7 @@ ValueError: Invalid input tag ``` -According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available. +According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b](https://www.unicode.org/reports/tr36/), `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available. ## Automated Detection @@ -139,6 +139,6 @@ According to *Unicode Technical Report #36, Unicode Security Considerations [Dav ||| |:---|:---| -|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] | -|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | -|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] | +|\[Unicode 2024\]|Unicode 16.0.0 \[online\]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) \[accessed 20 March 2025\] | +|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] | +|\[Davis 2008b\]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" \[online\]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) \[accessed 20 March 2025\] | diff --git a/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md b/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md new file mode 100644 index 00000000..9e62ed0f --- /dev/null +++ b/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180/README.md @@ -0,0 +1,181 @@ +# CWE-180: Incorrect Behavior Order: Validate Before Canonicalize + +Normalize/canonicalize strings before validating them to prevent risky strings such as `../../../../passwd` allowing directory traversal attacks, and to reduce `XSS` attacks. + +The need for supporting multiple languages requires the use of an extended list of characters encoding such as `UTF-8` supporting __1,112,064__ displayable characters. + +Character Encoding systems such as `ASCII`, `Windows-1252`, or `UTF-8` consist of an agreed mapping between byte values and a human-readable character known as code points. Each code point represents a single relation between characters such as a fixed number "`\u002e`", its graphical representation "`.`", and name "`FULL STOP`" [[Batchelder 2022]](https://www.youtube.com/watch?v=sgHbC6udIqc). Using the same encoding assures that equivalent strings have a unique binary representation Unicode Standard _annex #15, Unicode Normalization Forms_ [[Davis 2008]](https://wiki.sei.cmu.edu/confluence/display/java/Rule+AA.+References#RuleAA.References-Davis08). Different or unexpected changes in encoding can allow attackers to workaround validation or input sanitation affords. + +> [!WARNING] +> Ensure to use allow lists to avoid having to maintain an deny list on a continuous basis (as exclusion lists are a moving target) as per [CWE-184: Incomplete List of Disallowed Input - Development Environment](../../CWE-693/CWE-184/README.md). + +
| NFKC normalized | +UTF-16 (hex) | +||||
|---|---|---|---|---|---|
| Hex | +Name | +Hex | +Name | +||
| . | +\u002e | +FULL STOP | +․ | +\u2024 | +ONE DOT LEADER | +
| .. | +\u002e\u002e | +FULL STOPFULL STOP | +‥ | +\u2025 | +TWO DOT LEADER | +
| / | +\u003f | +SOLIDUS | +/ | +\uff0f | +FULLWIDTH SOLIDUS | +