-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
Description
Bug report
Bug description:
My underlying issue is related to #79728 (comment), where the parameter for a mime header has been encoded, but the bytes for a multi-byte character are split across multiple lines. python is unable to parse this.
Of course, ideally, the client would handle this properly, but I don't have any control over that.
In e91dee8, handling was added for UnicodeEncodeError, but UnicodeDecodeError is still unhandled and the entire parse raises at this point.
Here is a minimal reproduction of an email a client is sending. It would be nice if the filename itself could be correctly parsed, but at a minimum, I would like to be able to get the file contents:
import email.policy
import secrets
from datetime import datetime, timezone
import sys
from email import encoders, message_from_bytes, utils
from email.message import EmailMessage
from email.mime.application import MIMEApplication
from email.utils import make_msgid
def main() -> None:
if sys.version_info[:2] < (3, 11):
now = datetime.now(timezone.utc)
else:
from datetime import UTC
now = datetime.now(UTC)
from_ = "[email protected]"
msg = EmailMessage(policy=email.policy.SMTP)
msg["date"] = utils.format_datetime(now)
msg["subject"] = "Mime param split bytes"
msg["from"] = from_
msg["message-id"] = make_msgid(domain=msg["from"].addresses[0].domain)
msg["to"] = "[email protected]"
msg.set_content("This is a test", "plain")
if not msg.is_multipart():
msg.make_mixed()
attachment = MIMEApplication(
secrets.token_bytes(256),
"pdf",
encoders.encode_base64,
policy=email.policy.SMTP,
)
attachment.add_header("Content-Disposition", "attachment", filename="test")
msg.attach(attachment)
msg_bytes = msg.as_bytes()
# split the bytes of a multi-byte character across lines.
filename = "作業報告書【子】.pdf"
filename_bytes = (
("%" + filename.encode("iso-2022-jp").hex("%").upper())
.encode("ascii")
.split(b"%52", 1)
)
msg_bytes = msg_bytes.replace(
b'attachment; filename="test"',
b"attachment;\r\n filename*0*=ISO-2022-JP''"
+ b"; \r\n filename*1*=".join(filename_bytes),
)
print(msg_bytes.decode())
# trigger parsing the mime-part with the filename
message_from_bytes(msg_bytes, policy=email.policy.SMTP).get_body("html")
if __name__ == "__main__":
main()
I ran this with Python 3.8 - 3.14 all with the same result.
$ uv run --no-config --managed-python --python 3.13 run.py
Traceback (most recent call last):
File "/tmp/run.py", line 65, in <module>
main()
~~~~^^
File "/tmp/run.py", line 62, in main
message_from_bytes(msg_bytes, policy=email.policy.SMTP).get_body("html")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1054, in get_body
for prio, part in self._find_body(self, preferencelist):
~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1025, in _find_body
yield from self._find_body(subpart, preferencelist)
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1014, in _find_body
if part.is_attachment():
~~~~~~~~~~~~~~~~~~^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 1010, in is_attachment
c_d = self.get('content-disposition')
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/message.py", line 507, in get
return self.policy.header_fetch_parse(k, v)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/policy.py", line 163, in header_fetch_parse
return self.header_factory(name, value)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 604, in __call__
return self[name](name, value)
~~~~~~~~~~^^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 192, in __new__
cls.parse(value, kwds)
~~~~~~~~~^^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/headerregistry.py", line 449, in parse
kwds['decoded'] = str(parse_tree)
~~~^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 136, in __str__
return ''.join(str(x) for x in self)
~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 136, in <genexpr>
return ''.join(str(x) for x in self)
~~~^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 814, in __str__
for name, value in self.params:
^^^^^^^^^^^
File "/home/aclemons/.local/share/uv/python/cpython-3.13.3-linux-aarch64-gnu/lib/python3.13/email/_header_value_parser.py", line 799, in params
value = value.decode(charset, 'surrogateescape')
UnicodeDecodeError: 'iso2022_jp' codec can't decode byte 0x3b in position 15: incomplete multibyte sequence
decoding with 'ISO-2022-JP' codec failed
If I patch my python and add UnicodeDecodeError to the except on line 800 in _header_value_parser.py
(
cpython/Lib/email/_header_value_parser.py
Line 800 in e846244
except (LookupError, UnicodeEncodeError): |
I checked other tickets to see if this had already been reported. I mentioned the root problem here with the handling of parameters with split bytes (#79728), but I also saw #116705 which asked about handling of UnicodeDecodeError at exactly this point too.
Thank you.
CPython versions tested on:
3.13
Operating systems tested on:
Linux