Skip to content

Commit b74f635

Browse files
authored
pySCG: adding 838 as part of #531 (#868)
Reviewed by Hubert and Dean. Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: myteron <[email protected]>
1 parent 44b3ff5 commit b74f635

File tree

4 files changed

+110
-5
lines changed

4 files changed

+110
-5
lines changed
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
# CWE-838: Inappropriate Encoding for Output Context
2+
3+
Inappropriate handling of an encoding from untrusted sources or unexpected encoding can lead to unexpected values, data loss, or become the root cause of an attack.
4+
5+
Mixed encoding can lead to unexpected results and become a root cause for attacks as showcased in [CWE-180: Incorrect behavior order: Validate before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) and [CWE-175: Improper Handling of Mixed Encoding.](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-175/README.md) This rule showcases capturing the root cause by untrusted source its original binary without compromising the logging system for forensics.
6+
7+
> [!CAUTION]
8+
> Processing any type of forensic data requires an environment that is sealed off to an extent that prevents any exploit from reaching other systems, including hardware!
9+
10+
## Non-Compliant Code Example - Forensic logging
11+
12+
The `noncompliant01.py` code trying to process data that contains a byte outside the valid range of UTF-8 encoding, resulting in unexpected behavior.
13+
14+
*[noncompliant01.py](noncompliant01.py):*
15+
16+
```python
17+
# SPDX-FileCopyrightText: OpenSSF project contributors
18+
# SPDX-License-Identifier: MIT
19+
"""Non-compliant Code Example"""
20+
21+
22+
def report_record_attack(stream: bytearray):
23+
print("important text:", stream.decode("utf-8"))
24+
25+
26+
#####################
27+
# attempting to exploit above code example
28+
#####################
29+
payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))
30+
# Introducing an error in the encoded text, a byte
31+
payload[3] = 128
32+
report_record_attack(payload)
33+
34+
```
35+
36+
Trying to decode the modified encoded text in UTF-8 will result in the following exception:
37+
38+
```bash
39+
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3: invalid start byte
40+
```
41+
42+
Python is expected to use the `UTF-8` charset by default, which is backward compatible with `ASCII` [Python docs - unicode](https://docs.python.org/3/howto/unicode.html). Depending on the Python installation it may also be configured with any other encoding. It is recommended to always stick to `UTF-8` inside your program and do not bend the configuration of the OS [Batchelder 2022](https://www.youtube.com/watch?v=sgHbC6udIqc).
43+
44+
## Compliant Solution - Forensic Logging
45+
46+
We can use the `Base64` encoding to allow for a lossless conversion of binary data to String and back. `Base64`, alongside `Base32` and `Base16`, are encodings specified in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648.html). Data encoded with one of these encodings can be safely sent by email, used as parts of URLs, or included as part of an `HTTP POST` request [Python docs - base64](https://docs.python.org/3/library/base64.html).
47+
Python provides a `base64` library that provides easy ways to encode and decode byte lists using the `RFC 4648` encodings.
48+
49+
In the `compliant01.py` code example, the same error is introduced in the encoded text, however this time, if there is a `UnicodeDecodeError`, we encode the stream using `Base64` and log it for forensic analysis. This results in no loss of data while highlighting an attempted attack with a potentially dangerous payload.
50+
51+
*[compliant01.py](compliant01.py):*
52+
53+
```python
54+
# SPDX-FileCopyrightText: OpenSSF project contributors
55+
# SPDX-License-Identifier: MIT
56+
"""Compliant Code Example"""
57+
58+
import base64
59+
60+
61+
def report_record_attack(stream: bytearray):
62+
try:
63+
decoded_text = stream.decode("utf-8")
64+
except UnicodeDecodeError as e:
65+
# Encode the stream using Base64 if there is an exception
66+
encoded_payload = base64.b64encode(stream).decode("utf-8")
67+
# Logging encoded payload for forensic analysis
68+
print("Base64 Encoded Payload for Forensic Analysis:", encoded_payload)
69+
print("Error decoding payload:", e)
70+
else:
71+
print("Important text:", decoded_text)
72+
73+
74+
#####################
75+
# attempting to exploit above code example
76+
#####################
77+
payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))
78+
# Introducing an error in the encoded text, a byte
79+
payload[3] = 128
80+
report_record_attack(payload)
81+
```
82+
83+
## Automated Detection
84+
85+
No detection.
86+
87+
## Related Guidelines
88+
89+
|||
90+
|:---|:---|
91+
|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-707: Improper Neutralization](https://cwe.mitre.org/data/definitions/707.html)|
92+
|[MITRE CWE](http://cwe.mitre.org/)|Base: [CWE-838: Inappropriate Encoding for Output Context](https://cwe.mitre.org/data/definitions/838.html)|
93+
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[STR03-J. Do not encode noncharacter data as a string](https://wiki.sei.cmu.edu/confluence/display/java/STR03-J.+Do+not+encode+noncharacter+data+as+a+string)|
94+
95+
## Bibliography
96+
97+
|||
98+
|:---|:---|
99+
|\[Python docs - unicode\]|Python Software Foundation. (2023). Unicode HOWTO. \[online\]. Available from: <https://docs.python.org/3/howto/unicode.html> \[accessed 28 April 2025\]|
100+
|\[RFC 4648\]|Simon, J. Internet Engineering Task Force (2006). The Base16, Base32, and Base64 Data Encodings.\[online\]. Available from: <https://datatracker.ietf.org/doc/html/rfc4648.html> \[accessed 28 April 2025\]|
101+
|\[Python docs - base64\]|Python Software Foundation. (2023). base64 - Base16, Base32, Base64, Base85 Data Encodings.\[online\]. Available from: <https://docs.python.org/3/library/base64.html> \[accessed 28 April 2025\]|
102+
|\[Batchelder 2022\]|Ned Batchelder, Pragmatic Unicode, or, How do I stop the pain? \[online\]. Available from: <https://www.youtube.com/watch?v=sgHbC6udIqc> \[accessed 28 April 2025\]|

docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-838/compliant01.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,4 @@ def report_record_attack(stream: bytearray):
2424
payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))
2525
# Introducing an error in the encoded text, a byte
2626
payload[3] = 128
27-
report_record_attack(payload)
27+
report_record_attack(payload)
Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,16 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
3-
""" Non-compliant Code Example """
4-
3+
"""Non-compliant Code Example"""
4+
5+
56
def report_record_attack(stream: bytearray):
67
print("important text:", stream.decode("utf-8"))
7-
8+
9+
810
#####################
911
# attempting to exploit above code example
1012
#####################
1113
payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))
1214
# Introducing an error in the encoded text, a byte
1315
payload[3] = 128
14-
report_record_attack(payload)
16+
report_record_attack(payload)

docs/Secure-Coding-Guide-for-Python/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ It is __not production code__ and requires code-style or python best practices t
100100
|[CWE-117: Improper Output Neutralization for Logs](CWE-707/CWE-117/.)||
101101
|[CWE-175: Improper Handling of Mixed Encoding](CWE-707/CWE-175/README.md)||
102102
|[CWE-180: Incorrect behavior order: Validate before Canonicalize](CWE-707/CWE-180/README.md)|[CVE-2022-26136](https://www.cvedetails.com/cve/CVE-2022-26136/),<br/>CVSSv3.1: __9.8__,<br/>EPSS: __00.18__ (24.04.2025)|
103+
|[CWE-838: Inappropriate Encoding for Output Context](CWE-707/CWE-838/README.md)||
103104

104105
|[CWE-710: Improper Adherence to Coding Standards](https://cwe.mitre.org/data/definitions/710.html)|Prominent CVE|
105106
|:----------------------------------------------------------------|:----|

0 commit comments

Comments
 (0)