|
| 1 | +# CWE-838: Inappropriate Encoding for Output Context |
| 2 | + |
| 3 | +Inappropriate handling of an encoding from untrusted sources or unexpected encoding can lead to unexpected values, data loss, or become the root cause of an attack. |
| 4 | + |
| 5 | +Mixed encoding can lead to unexpected results and become a root cause for attacks as showcased in [CWE-180: Incorrect behavior order: Validate before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) and [CWE-175: Improper Handling of Mixed Encoding.](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-175/README.md) This rule showcases capturing the root cause by untrusted source its original binary without compromising the logging system for forensics. |
| 6 | + |
| 7 | +> [!CAUTION] |
| 8 | +> Processing any type of forensic data requires an environment that is sealed off to an extent that prevents any exploit from reaching other systems, including hardware! |
| 9 | +
|
| 10 | +## Non-Compliant Code Example - Forensic logging |
| 11 | + |
| 12 | +The `noncompliant01.py` code trying to process data that contains a byte outside the valid range of UTF-8 encoding, resulting in unexpected behavior. |
| 13 | + |
| 14 | +*[noncompliant01.py](noncompliant01.py):* |
| 15 | + |
| 16 | +```python |
| 17 | +# SPDX-FileCopyrightText: OpenSSF project contributors |
| 18 | +# SPDX-License-Identifier: MIT |
| 19 | +"""Non-compliant Code Example""" |
| 20 | + |
| 21 | + |
| 22 | +def report_record_attack(stream: bytearray): |
| 23 | + print("important text:", stream.decode("utf-8")) |
| 24 | + |
| 25 | + |
| 26 | +##################### |
| 27 | +# attempting to exploit above code example |
| 28 | +##################### |
| 29 | +payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8")) |
| 30 | +# Introducing an error in the encoded text, a byte |
| 31 | +payload[3] = 128 |
| 32 | +report_record_attack(payload) |
| 33 | + |
| 34 | +``` |
| 35 | + |
| 36 | +Trying to decode the modified encoded text in UTF-8 will result in the following exception: |
| 37 | + |
| 38 | +```bash |
| 39 | +UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3: invalid start byte |
| 40 | +``` |
| 41 | +
|
| 42 | +Python is expected to use the `UTF-8` charset by default, which is backward compatible with `ASCII` [Python docs - unicode](https://docs.python.org/3/howto/unicode.html). Depending on the Python installation it may also be configured with any other encoding. It is recommended to always stick to `UTF-8` inside your program and do not bend the configuration of the OS [Batchelder 2022](https://www.youtube.com/watch?v=sgHbC6udIqc). |
| 43 | +
|
| 44 | +## Compliant Solution - Forensic Logging |
| 45 | +
|
| 46 | +We can use the `Base64` encoding to allow for a lossless conversion of binary data to String and back. `Base64`, alongside `Base32` and `Base16`, are encodings specified in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648.html). Data encoded with one of these encodings can be safely sent by email, used as parts of URLs, or included as part of an `HTTP POST` request [Python docs - base64](https://docs.python.org/3/library/base64.html). |
| 47 | +Python provides a `base64` library that provides easy ways to encode and decode byte lists using the `RFC 4648` encodings. |
| 48 | +
|
| 49 | +In the `compliant01.py` code example, the same error is introduced in the encoded text, however this time, if there is a `UnicodeDecodeError`, we encode the stream using `Base64` and log it for forensic analysis. This results in no loss of data while highlighting an attempted attack with a potentially dangerous payload. |
| 50 | +
|
| 51 | +*[compliant01.py](compliant01.py):* |
| 52 | +
|
| 53 | +```python |
| 54 | +# SPDX-FileCopyrightText: OpenSSF project contributors |
| 55 | +# SPDX-License-Identifier: MIT |
| 56 | +"""Compliant Code Example""" |
| 57 | +
|
| 58 | +import base64 |
| 59 | +
|
| 60 | +
|
| 61 | +def report_record_attack(stream: bytearray): |
| 62 | + try: |
| 63 | + decoded_text = stream.decode("utf-8") |
| 64 | + except UnicodeDecodeError as e: |
| 65 | + # Encode the stream using Base64 if there is an exception |
| 66 | + encoded_payload = base64.b64encode(stream).decode("utf-8") |
| 67 | + # Logging encoded payload for forensic analysis |
| 68 | + print("Base64 Encoded Payload for Forensic Analysis:", encoded_payload) |
| 69 | + print("Error decoding payload:", e) |
| 70 | + else: |
| 71 | + print("Important text:", decoded_text) |
| 72 | +
|
| 73 | +
|
| 74 | +##################### |
| 75 | +# attempting to exploit above code example |
| 76 | +##################### |
| 77 | +payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8")) |
| 78 | +# Introducing an error in the encoded text, a byte |
| 79 | +payload[3] = 128 |
| 80 | +report_record_attack(payload) |
| 81 | +``` |
| 82 | +
|
| 83 | +## Automated Detection |
| 84 | +
|
| 85 | +No detection. |
| 86 | +
|
| 87 | +## Related Guidelines |
| 88 | +
|
| 89 | +||| |
| 90 | +|:---|:---| |
| 91 | +|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-707: Improper Neutralization](https://cwe.mitre.org/data/definitions/707.html)| |
| 92 | +|[MITRE CWE](http://cwe.mitre.org/)|Base: [CWE-838: Inappropriate Encoding for Output Context](https://cwe.mitre.org/data/definitions/838.html)| |
| 93 | +|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[STR03-J. Do not encode noncharacter data as a string](https://wiki.sei.cmu.edu/confluence/display/java/STR03-J.+Do+not+encode+noncharacter+data+as+a+string)| |
| 94 | +
|
| 95 | +## Bibliography |
| 96 | +
|
| 97 | +||| |
| 98 | +|:---|:---| |
| 99 | +|\[Python docs - unicode\]|Python Software Foundation. (2023). Unicode HOWTO. \[online\]. Available from: <https://docs.python.org/3/howto/unicode.html> \[accessed 28 April 2025\]| |
| 100 | +|\[RFC 4648\]|Simon, J. Internet Engineering Task Force (2006). The Base16, Base32, and Base64 Data Encodings.\[online\]. Available from: <https://datatracker.ietf.org/doc/html/rfc4648.html> \[accessed 28 April 2025\]| |
| 101 | +|\[Python docs - base64\]|Python Software Foundation. (2023). base64 - Base16, Base32, Base64, Base85 Data Encodings.\[online\]. Available from: <https://docs.python.org/3/library/base64.html> \[accessed 28 April 2025\]| |
| 102 | +|\[Batchelder 2022\]|Ned Batchelder, Pragmatic Unicode, or, How do I stop the pain? \[online\]. Available from: <https://www.youtube.com/watch?v=sgHbC6udIqc> \[accessed 28 April 2025\]| |
0 commit comments