Skip to content

Commit 5774935

Browse files
Prevent ZIP confusion attacks, warn on RECORD mismatch (#18492)
Co-authored-by: Mike Fiedler <[email protected]>
1 parent 26ccf9b commit 5774935

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+1104
-3
lines changed
Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
---
2+
title: Preventing ZIP parser confusion attacks on Python package installers
3+
description: PyPI will begin warning and will later reject wheels that contain differentiable ZIP features or incorrect RECORD files.
4+
authors:
5+
- sethmlarson
6+
date: 2025-08-07
7+
tags:
8+
- security
9+
- publishing
10+
- deprecation
11+
---
12+
13+
The Python Package Index is introducing new restrictions to protect
14+
Python package installers and inspectors from confusion attacks arising
15+
from ZIP parser implementations. This has been done in response to
16+
the discovery that the popular installer uv has a different extraction behavior
17+
to many Python-based installers that use the ZIP parser implementation
18+
provided by the `zipfile` standard library module.
19+
20+
## Summary
21+
22+
* ZIP archives constructed to exploit ZIP confusion attacks are now rejected by PyPI.
23+
* There is no evidence that this vulnerability has been exploited using PyPI.
24+
* PyPI is deprecating wheel distributions with incorrect `RECORD` files.
25+
26+
Please see [this blog post](https://astral.sh/blog/uv-security-advisory-cve-2025-54368) and [CVE-2025-54368](https://github.com/astral-sh/uv/security/advisories/GHSA-8qf3-x8v5-2pj8)
27+
for more information on uv's patch.
28+
29+
<!-- more -->
30+
31+
## Wheels are ZIPs, and ZIPs are complicated
32+
33+
Python package "wheels" (or "binary distributions"), like many other file formats,
34+
actually a ZIP in disguise. The [ZIP archive standard](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) was created in 1989, where large archives
35+
might need to be stored across multiple distinct storage units due to size constraints. This requirement influenced
36+
the design of the ZIP archive standard, such as being able to update or delete already-archived
37+
files by appending new records to the end of a ZIP instead of having to rewrite the entire ZIP
38+
from scratch which might potentially be on another disk.
39+
40+
These design considerations meant that the ZIP standard is complicated to implement, and
41+
in many ways is ambiguous in what the "result" of extracting a valid ZIP file should be.
42+
43+
The ["Binary Distribution Format" specification](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#binary-distribution-format)
44+
defines how a wheel is [meant to be installed](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl).
45+
However, the specification leaves many of the details on how exactly to extract the archive
46+
and handle ZIP-specific features to implementations. The most detail provided is:
47+
48+
> Although a specialized installer is recommended, a wheel file may be installed by simply unpacking into site-packages with the standard ‘unzip’ tool while preserving enough information to spread its contents out onto their final paths at any later time.
49+
50+
This means that ZIP ambiguities are unlikely to be caught by installers, as there are no
51+
restrictions for which ZIP features are allowed in a valid wheel archive.
52+
53+
There's also a Python packaging specific mechanism for which files are meant to be included
54+
in a wheel. The `RECORD` file included inside wheel `.dist-info` directories
55+
lists files by name and optionally a checksum (like SHA256).
56+
The [specification for the `.dist-info` directory](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-directory)
57+
details how installers are supposed to check the contents of the ZIP archive against `RECORD`:
58+
59+
> Apart from `RECORD` and its signatures, installation will fail if any file in the archive is not both mentioned and correctly hashed in `RECORD`.
60+
61+
However, most Python installers today do not do this check and extract the contents
62+
of the ZIP archive similar to `unzip` and then amend the installed `RECORD` within the
63+
virtual environment so that uninstalling the package works as expected.
64+
65+
This means that there is no forcing function on Python projects and
66+
packaging tools to follow packaging standards or normalize their use of ZIP archive features.
67+
This leads to the ambiguous situation today where no one installer can start
68+
enforcing standards without accidentally "breaking" projects and archives
69+
that already exist on PyPI.
70+
71+
PyPI is adopting a few measures to prevent attackers from abusing the complexities
72+
of ZIP archives and installers not checking `RECORD` files to smuggle files past
73+
manual review processes and automated detection tools.
74+
75+
## What is PyPI doing to prevent ZIP confusion attacks?
76+
77+
The correct method to unpack a ZIP is to first check the Central Directory
78+
of files before extracting entries. See this [blog post](https://www.crowdstrike.com/en-us/blog/how-to-prevent-zip-file-exploitation/)
79+
for a more detailed explanation of ZIP confusion attacks.
80+
81+
PyPI is implementing the following logic to prevent ZIP confusion attacks on
82+
the upload of wheels and ZIPs:
83+
84+
* Rejecting ZIP archives with invalid record and framing information.
85+
* Rejecting ZIP archives with duplicate filenames in Local File and Central Directory headers.
86+
* Rejecting ZIP archives where files included in Local File and Central Directory headers don't match.
87+
* Rejecting ZIP archives with trailing data or multiple End of Central Directory headers.
88+
* Rejecting ZIP archives with incorrect End of Central Directory Locator values.
89+
90+
PyPI already implements ZIP and tarball compression-bomb detection
91+
as a part of upload processing.
92+
93+
PyPI will also begin sending emails to **warn users when wheels are published
94+
whose ZIP contents don't match the included `RECORD` metadata file**. After 6 months of warnings,
95+
on February 1st, 2026, PyPI will begin **rejecting** newly uploaded wheels whose ZIP contents
96+
don't match the included `RECORD` metadata file.
97+
98+
We encourage all Python installers to use this opportunity to
99+
implement cross-checking of extracted wheel contents with the `RECORD` metadata file.
100+
101+
## `RECORD` and ZIP issues in top Python packages
102+
103+
Almost all the top 15,000 Python packages by downloads (of which 13,468 publish wheels)
104+
have no issues with the ZIP format or the `RECORD` metadata file.
105+
This makes us confident that we can deploy
106+
these changes without major disruption of existing Python project
107+
development.
108+
109+
| Status | Number of Projects |
110+
|-------------------------------------|--------------------|
111+
| No `RECORD` or ZIP issues | 13,460 |
112+
| Missing file from `RECORD` | 4 |
113+
| Mismatched `RECORD` and ZIP headers | 2 |
114+
| Duplicate files in ZIP headers | 2 |
115+
| Other ZIP format issues | 0 |
116+
117+
Note that there are more occurrences of ZIP and `RECORD` issues
118+
that have been reported for other projects on PyPI, but those projects
119+
are not in the top 15,000 by downloads.
120+
121+
## What actions should I take?
122+
123+
The mitigations above mean that
124+
users of PyPI, regardless of their installer, don't need to take immediate action
125+
to be safe. We recommend the following actions to users of PyPI to ensure
126+
compliance with Python package and ZIP standards:
127+
128+
* **For users installing PyPI projects**: Make sure your installer tools are up-to-date.
129+
* **For maintainers of PyPI projects**: If you encounter an error during upload,
130+
read the error message and update your own build process or report the issue
131+
to your build tool, if applicable.
132+
* **For maintainers of installer projects**: Ensure that your ZIP implementation follows the ZIP standard
133+
and checks the Central Directory before proceeding with decompression.
134+
See the CPython `zipfile` module for a ZIP implementation that implements this
135+
logic. Begin checking the `RECORD` file against ZIP contents and erroring
136+
or warning the user that the wheel is incorrectly formatted.
137+
138+
## Acknowledgements
139+
140+
Thanks to Caleb Brown (Google Open Source Security Team) and Tim Hatch (Netflix) for reporting this issue.
141+
142+
This level of coordination across Python ecosystem projects requires significant
143+
engineering time investment. Thanks to [Alpha-Omega](https://alpha-omega.dev) who sponsors the security-focused
144+
[Developer-in-Residence](https://www.python.org/psf/developersinresidence/) positions at the Python Software Foundation.

tests/unit/email/test_init.py

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6092,6 +6092,92 @@ def test_pep427_emails(
60926092
)
60936093
]
60946094

6095+
def test_wheel_record_mismatch_email(
6096+
self,
6097+
pyramid_request,
6098+
pyramid_config,
6099+
monkeypatch,
6100+
):
6101+
stub_user = pretend.stub(
6102+
id="id",
6103+
username="username",
6104+
name="",
6105+
6106+
primary_email=pretend.stub(email="[email protected]", verified=True),
6107+
)
6108+
subject_renderer = pyramid_config.testing_add_renderer(
6109+
"email/wheel-record-mismatch-email/subject.txt"
6110+
)
6111+
subject_renderer.string_response = "Email Subject"
6112+
body_renderer = pyramid_config.testing_add_renderer(
6113+
"email/wheel-record-mismatch-email/body.txt"
6114+
)
6115+
body_renderer.string_response = "Email Body"
6116+
html_renderer = pyramid_config.testing_add_renderer(
6117+
"email/wheel-record-mismatch-email/body.html"
6118+
)
6119+
html_renderer.string_response = "Email HTML Body"
6120+
6121+
send_email = pretend.stub(
6122+
delay=pretend.call_recorder(lambda *args, **kwargs: None)
6123+
)
6124+
pyramid_request.task = pretend.call_recorder(lambda *args, **kwargs: send_email)
6125+
monkeypatch.setattr(email, "send_email", send_email)
6126+
6127+
pyramid_request.db = pretend.stub(
6128+
query=lambda a: pretend.stub(
6129+
filter=lambda *a: pretend.stub(
6130+
one=lambda: pretend.stub(user_id=stub_user.id)
6131+
)
6132+
),
6133+
)
6134+
pyramid_request.user = stub_user
6135+
pyramid_request.registry.settings = {"mail.sender": "[email protected]"}
6136+
6137+
project_name = "Test_Project"
6138+
filename = "Test_Project-1.0-py3-none-any.whl"
6139+
6140+
result = email.send_wheel_record_mismatch_email(
6141+
pyramid_request,
6142+
{stub_user},
6143+
project_name=project_name,
6144+
filename=filename,
6145+
)
6146+
6147+
assert result == {
6148+
"project_name": project_name,
6149+
"filename": filename,
6150+
}
6151+
subject_renderer.assert_(project_name=project_name)
6152+
body_renderer.assert_(project_name=project_name)
6153+
html_renderer.assert_(project_name=project_name)
6154+
6155+
assert pyramid_request.task.calls == [pretend.call(send_email)]
6156+
assert send_email.delay.calls == [
6157+
pretend.call(
6158+
f"{stub_user.username} <{stub_user.email}>",
6159+
{
6160+
"sender": None,
6161+
"subject": "Email Subject",
6162+
"body_text": "Email Body",
6163+
"body_html": (
6164+
"<html>\n<head></head>\n"
6165+
"<body><p>Email HTML Body</p></body>\n</html>\n"
6166+
),
6167+
},
6168+
{
6169+
"tag": "account:email:sent",
6170+
"user_id": stub_user.id,
6171+
"additional": {
6172+
"from_": "[email protected]",
6173+
"to": stub_user.email,
6174+
"subject": "Email Subject",
6175+
"redact_ip": False,
6176+
},
6177+
},
6178+
)
6179+
]
6180+
60956181

60966182
class TestUserTermsOfServiceUpdateEmail:
60976183
def test_user_terms_of_service_updated(

0 commit comments

Comments
 (0)