| 
 | 1 | +# CWE-838: Inappropriate Encoding for Output Context  | 
 | 2 | + | 
 | 3 | +Inappropriate handling of an encoding from untrusted sources or unexpected encoding can lead to unexpected values, data loss, or become the root cause of an attack.  | 
 | 4 | + | 
 | 5 | +Mixed encoding can lead to unexpected results and become a root cause for attacks as showcased in [CWE-180: Incorrect behavior order: Validate before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) and [CWE-175: Improper Handling of Mixed Encoding.](https://github.com/ossf/wg-best-practices-os-developers/blob/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-175/README.md) This rule showcases capturing the root cause by untrusted source its original binary without compromising the logging system for forensics.  | 
 | 6 | + | 
 | 7 | +> [!CAUTION]  | 
 | 8 | +> Processing any type of forensic data requires an environment that is sealed off to an extent that prevents any exploit from reaching other systems, including hardware!  | 
 | 9 | +
  | 
 | 10 | +## Non-Compliant Code Example - Forensic logging  | 
 | 11 | + | 
 | 12 | +The `noncompliant01.py` code trying to process data that contains a byte outside the valid range of UTF-8 encoding, resulting in unexpected behavior.  | 
 | 13 | + | 
 | 14 | +*[noncompliant01.py](noncompliant01.py):*  | 
 | 15 | + | 
 | 16 | +```python  | 
 | 17 | +# SPDX-FileCopyrightText: OpenSSF project contributors  | 
 | 18 | +# SPDX-License-Identifier: MIT  | 
 | 19 | +"""Non-compliant Code Example"""  | 
 | 20 | + | 
 | 21 | + | 
 | 22 | +def report_record_attack(stream: bytearray):  | 
 | 23 | +    print("important text:", stream.decode("utf-8"))  | 
 | 24 | + | 
 | 25 | + | 
 | 26 | +#####################  | 
 | 27 | +# attempting to exploit above code example  | 
 | 28 | +#####################  | 
 | 29 | +payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))  | 
 | 30 | +# Introducing an error in the encoded text, a byte  | 
 | 31 | +payload[3] = 128  | 
 | 32 | +report_record_attack(payload)  | 
 | 33 | + | 
 | 34 | +```  | 
 | 35 | + | 
 | 36 | +Trying to decode the modified encoded text in UTF-8 will result in the following exception:  | 
 | 37 | + | 
 | 38 | +```bash  | 
 | 39 | +UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3: invalid start byte  | 
 | 40 | +```  | 
 | 41 | +
  | 
 | 42 | +Python is expected to use the `UTF-8` charset by default, which is backward compatible with `ASCII` [Python docs - unicode](https://docs.python.org/3/howto/unicode.html). Depending on the Python installation it may also be configured with any other encoding. It is recommended to always stick to `UTF-8` inside your program and do not bend the configuration of the OS [Batchelder 2022](https://www.youtube.com/watch?v=sgHbC6udIqc).  | 
 | 43 | +
  | 
 | 44 | +## Compliant Solution - Forensic Logging  | 
 | 45 | +
  | 
 | 46 | +We can use the `Base64` encoding to allow for a lossless conversion of binary data to String  and back. `Base64`, alongside `Base32` and `Base16`, are encodings specified in [RFC 4648](https://datatracker.ietf.org/doc/html/rfc4648.html). Data encoded with one of these encodings can be safely sent by email, used as parts of URLs, or included as part of an `HTTP POST` request [Python docs - base64](https://docs.python.org/3/library/base64.html).  | 
 | 47 | +Python provides a `base64` library that provides easy ways to encode and decode byte lists using the `RFC 4648` encodings.  | 
 | 48 | +
  | 
 | 49 | +In the `compliant01.py` code example, the same error is introduced in the encoded text, however this time, if there is a `UnicodeDecodeError`, we encode the stream using `Base64` and log it for forensic analysis. This results in no loss of data while highlighting an attempted attack with a potentially dangerous payload.  | 
 | 50 | +
  | 
 | 51 | +*[compliant01.py](compliant01.py):*  | 
 | 52 | +
  | 
 | 53 | +```python  | 
 | 54 | +# SPDX-FileCopyrightText: OpenSSF project contributors  | 
 | 55 | +# SPDX-License-Identifier: MIT  | 
 | 56 | +"""Compliant Code Example"""  | 
 | 57 | +
  | 
 | 58 | +import base64  | 
 | 59 | +
  | 
 | 60 | +
  | 
 | 61 | +def report_record_attack(stream: bytearray):  | 
 | 62 | +    try:  | 
 | 63 | +        decoded_text = stream.decode("utf-8")  | 
 | 64 | +    except UnicodeDecodeError as e:  | 
 | 65 | +        # Encode the stream using Base64 if there is an exception  | 
 | 66 | +        encoded_payload = base64.b64encode(stream).decode("utf-8")  | 
 | 67 | +        # Logging encoded payload for forensic analysis  | 
 | 68 | +        print("Base64 Encoded Payload for Forensic Analysis:", encoded_payload)  | 
 | 69 | +        print("Error decoding payload:", e)  | 
 | 70 | +    else:  | 
 | 71 | +        print("Important text:", decoded_text)  | 
 | 72 | +
  | 
 | 73 | +
  | 
 | 74 | +#####################  | 
 | 75 | +# attempting to exploit above code example  | 
 | 76 | +#####################  | 
 | 77 | +payload = bytearray("user: 毛泽东先生 attempted a directory traversal".encode("utf-8"))  | 
 | 78 | +# Introducing an error in the encoded text, a byte  | 
 | 79 | +payload[3] = 128  | 
 | 80 | +report_record_attack(payload)  | 
 | 81 | +```  | 
 | 82 | +
  | 
 | 83 | +## Automated Detection  | 
 | 84 | +
  | 
 | 85 | +No detection.  | 
 | 86 | +
  | 
 | 87 | +## Related Guidelines  | 
 | 88 | +
  | 
 | 89 | +|||  | 
 | 90 | +|:---|:---|  | 
 | 91 | +|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-707: Improper Neutralization](https://cwe.mitre.org/data/definitions/707.html)|  | 
 | 92 | +|[MITRE CWE](http://cwe.mitre.org/)|Base: [CWE-838: Inappropriate Encoding for Output Context](https://cwe.mitre.org/data/definitions/838.html)|  | 
 | 93 | +|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[STR03-J. Do not encode noncharacter data as a string](https://wiki.sei.cmu.edu/confluence/display/java/STR03-J.+Do+not+encode+noncharacter+data+as+a+string)|  | 
 | 94 | +
  | 
 | 95 | +## Bibliography  | 
 | 96 | +
  | 
 | 97 | +|||  | 
 | 98 | +|:---|:---|  | 
 | 99 | +|\[Python docs - unicode\]|Python Software Foundation. (2023). Unicode HOWTO. \[online\]. Available from: <https://docs.python.org/3/howto/unicode.html> \[accessed 28 April 2025\]|  | 
 | 100 | +|\[RFC 4648\]|Simon, J. Internet Engineering Task Force (2006). The Base16, Base32, and Base64 Data Encodings.\[online\]. Available from: <https://datatracker.ietf.org/doc/html/rfc4648.html> \[accessed 28 April 2025\]|  | 
 | 101 | +|\[Python docs - base64\]|Python Software Foundation. (2023). base64 - Base16, Base32, Base64, Base85 Data Encodings.\[online\]. Available from: <https://docs.python.org/3/library/base64.html> \[accessed 28 April 2025\]|  | 
 | 102 | +|\[Batchelder 2022\]|Ned Batchelder, Pragmatic Unicode, or, How do I stop the pain? \[online\]. Available from: <https://www.youtube.com/watch?v=sgHbC6udIqc> \[accessed 28 April 2025\]|  | 
0 commit comments