Skip to content

Decode MIME-encoded author names retrieved from email field when displaying in project sidebars #18930

@webknjaz

Description

@webknjaz

Describe the bug

I stumbled upon $sbj by accident.

This is what https://pypi.org/p/typer and https://pypi.org/p/typer-slim display:

Author: =?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?=

Expected behavior

It should render

Author: Sebastián Ramírez

instead.

To Reproduce

Upload a package with author name having unicode chars, it seems.

My Platform

N/A

Additional context

The rendered text is exactly as it appears in the metadata:

I imagine warehouse probably assumes that whatever's in the email name portion is plain text latin-1 and doesn't need decoding, which is evidently not the case here.

The projects seem to be built with PDM. Additionally, PEP 621 medadata field uses unicode literals in pyproject.toml: https://github.com/fastapi/typer/blob/7be1c8db9fa2475f1c4537e57c053cb864aed963/pyproject.toml#L10C14-L10C31.

I checked the metadata across the versions and discovered that it's all v2.1 but the way author is represented changed twice over time:

typer <= 0.10.0

Author: Sebastián Ramírez
Author-email: [email protected]

typer >= 0.11.0, <= 0.12.3

Author-Email: Sebastián Ramírez <[email protected]>

typer >= 0.12.4

Author-Email: =?utf-8?q?Sebasti=C3=A1n_Ram=C3=ADrez?= <[email protected]>

The metadata spec says that this field is expected to follow RFC 822, which is rather antiquated these days. It's obsoleted by RFC 2822, and updated by RFC 1123, RFC 2156, RFC 1327, RFC 1138 and RFC 1148. RFC 2822, in turn, is obsoleted by RFC 5322 and updated by RFC 5335, RFC 5336.

It seems like those newer RFCs explicitly document that for non-US ASCII chars, MIME encoding is expected to be used. So I think that PDM does the correct thing in the most recent representation variant and PyPI should process it accordingly too.

It may be reasonable to update the metdata spec to be a bit more explicit about this, though.

Typer v0.12.4 was released on Aug 17, 2024. Which means it was probably produced with PDM v2.18.1 (published just a day before — on Aug 16, 2024). I, however, haven't noticed anything related in the change log: https://pdm-project.org/en/latest/dev/changelog/#release-v2181-2024-08-16. So it might be coming from a transitive dependency or CPython runtime having been updated at the same time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    UX/UIdesign, user experience, user interfacebug 🐛data qualityi18nInternationalizationmetadataIssues associated with Project/Release/File metadata

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions