Skip to content

UnicodeEncodeError during mime header parsing is unhandled in _header_value_parser.py #132794

@aclemons

Description

@aclemons

Bug report

Bug description:

My underlying issue is related to #79728 (comment), where the parameter for a mime header has been encoded, but the bytes for a multi-byte character are split across multiple lines. python is unable to parse this.

Of course, ideally, the client would handle this properly, but I don't have any control over that.

In e91dee8, handling was added for UnicodeEncodeError, but UnicodeDecodeError is still unhandled and the entire parse raises at this point.

Here is a minimal reproduction of an email a client is sending. It would be nice if the filename itself could be correctly parsed, but at a minimum, I would like to be able to get the file contents:

import email.policy
import secrets
from datetime import datetime, timezone
import sys
from email import encoders, message_from_bytes, utils
from email.message import EmailMessage
from email.mime.application import MIMEApplication
from email.utils import make_msgid


def main() -> None:
    if sys.version_info[:2] < (3, 11):
        now = datetime.now(timezone.utc)
    else:
        from datetime import UTC

        now = datetime.now(UTC)

    from_ = "[email protected]"

    msg = EmailMessage(policy=email.policy.SMTP)
    msg["date"] = utils.format_datetime(now)

    msg["subject"] = "Mime param split bytes"

    msg["from"] = from_
    msg["message-id"] = make_msgid(domain=msg["from"].addresses[0].domain)

    msg["to"] = "[email protected]"

    msg.set_content("This is a test", "plain")

    if not msg.is_multipart():
        msg.make_mixed()

    attachment = MIMEApplication(
        secrets.token_bytes(256),
        "pdf",
        encoders.encode_base64,
        policy=email.policy.SMTP,
    )
    attachment.add_header("Content-Disposition", "attachment", filename="test")
    msg.attach(attachment)

    msg_bytes = msg.as_bytes()

    # split the bytes of a multi-byte character across lines.
    filename = "作業報告書【子】.pdf"
    filename_bytes = (
        ("%" + filename.encode("iso-2022-jp").hex("%").upper())
        .encode("ascii")
        .split(b"%52", 1)
    )
    msg_bytes = msg_bytes.replace(
        b'attachment; filename="test"',
        b"attachment;\r\n filename*0*=ISO-2022-JP''"
        + b"; \r\n filename*1*=".join(filename_bytes),
    )

    print(msg_bytes.decode())

    # trigger parsing the mime-part with the filename
    message_from_bytes(msg_bytes, policy=email.policy.SMTP).get_body("html")


if __name__ == "__main__":
    main()

I ran this with Python 3.8 - 3.14 all with the same result.

$ uv run --no-config --managed-python --python 3.13 run.py
Traceback (most recent call last):
  File "/tmp/run.py", line 65, in <module>
    main()
    ~~~~^^
  File "/tmp/run.py", line 62, in main
    message_from_bytes(msg_bytes, policy=email.policy.SMTP).get_body("html")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1054, in get_body
    for prio, part in self._find_body(self, preferencelist):
                      ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1025, in _find_body
    yield from self._find_body(subpart, preferencelist)
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1014, in _find_body
    if part.is_attachment():
       ~~~~~~~~~~~~~~~~~~^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1010, in is_attachment
    c_d = self.get('content-disposition')
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 507, in get
    return self.policy.header_fetch_parse(k, v)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
           ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 604, in __call__
    return self[name](name, value)
           ~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 192, in __new__
    cls.parse(value, kwds)
    ~~~~~~~~~^^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 449, in parse
    kwds['decoded'] = str(parse_tree)
                      ~~~^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 136, in __str__
    return ''.join(str(x) for x in self)
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 136, in <genexpr>
    return ''.join(str(x) for x in self)
                   ~~~^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 814, in __str__
    for name, value in self.params:
                       ^^^^^^^^^^^
  File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 799, in params
    value = value.decode(charset, 'surrogateescape')
UnicodeDecodeError: 'iso2022_jp' codec can't decode byte 0x3b in position 15: incomplete multibyte sequence
decoding with 'ISO-2022-JP' codec failed

If I patch my python and add UnicodeDecodeError to the except on line 800 in _header_value_parser.py (

except (LookupError, UnicodeEncodeError):
), I can at least interact with the email, even if the attachment filename from the parameter is garbled.

I checked other tickets to see if this had already been reported. I mentioned the root problem here with the handling of parameters with split bytes (#79728), but I also saw #116705 which asked about handling of UnicodeDecodeError at exactly this point too.

Thank you.

CPython versions tested on:

3.13

Operating systems tested on:

Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-emailtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions