|
| 1 | +--- |
| 2 | +title: Preventing ZIP parser confusion attacks on Python package installers |
| 3 | +description: PyPI will begin warning and will later reject wheels that contain differentiable ZIP features or incorrect RECORD files. |
| 4 | +authors: |
| 5 | + - sethmlarson |
| 6 | +date: 2025-08-07 |
| 7 | +tags: |
| 8 | + - security |
| 9 | + - publishing |
| 10 | + - deprecation |
| 11 | +--- |
| 12 | + |
| 13 | +The Python Package Index is introducing new restrictions to protect |
| 14 | +Python package installers and inspectors from confusion attacks arising |
| 15 | +from ZIP parser implementations. This has been done in response to |
| 16 | +the discovery that the popular installer uv has a different extraction behavior |
| 17 | +to many Python-based installers that use the ZIP parser implementation |
| 18 | +provided by the `zipfile` standard library module. |
| 19 | + |
| 20 | +## Summary |
| 21 | + |
| 22 | +* ZIP archives constructed to exploit ZIP confusion attacks are now rejected by PyPI. |
| 23 | +* There is no evidence that this vulnerability has been exploited using PyPI. |
| 24 | +* PyPI is deprecating wheel distributions with incorrect `RECORD` files. |
| 25 | + |
| 26 | +Please see [this blog post](https://astral.sh/blog/uv-security-advisory-cve-2025-54368) and [CVE-2025-54368](https://github.com/astral-sh/uv/security/advisories/GHSA-8qf3-x8v5-2pj8) |
| 27 | +for more information on uv's patch. |
| 28 | + |
| 29 | +<!-- more --> |
| 30 | + |
| 31 | +## Wheels are ZIPs, and ZIPs are complicated |
| 32 | + |
| 33 | +Python package "wheels" (or "binary distributions"), like many other file formats, |
| 34 | +actually a ZIP in disguise. The [ZIP archive standard](https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT) was created in 1989, where large archives |
| 35 | +might need to be stored across multiple distinct storage units due to size constraints. This requirement influenced |
| 36 | +the design of the ZIP archive standard, such as being able to update or delete already-archived |
| 37 | +files by appending new records to the end of a ZIP instead of having to rewrite the entire ZIP |
| 38 | +from scratch which might potentially be on another disk. |
| 39 | + |
| 40 | +These design considerations meant that the ZIP standard is complicated to implement, and |
| 41 | +in many ways is ambiguous in what the "result" of extracting a valid ZIP file should be. |
| 42 | + |
| 43 | +The ["Binary Distribution Format" specification](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#binary-distribution-format) |
| 44 | +defines how a wheel is [meant to be installed](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#installing-a-wheel-distribution-1-0-py32-none-any-whl). |
| 45 | +However, the specification leaves many of the details on how exactly to extract the archive |
| 46 | +and handle ZIP-specific features to implementations. The most detail provided is: |
| 47 | + |
| 48 | +> Although a specialized installer is recommended, a wheel file may be installed by simply unpacking into site-packages with the standard ‘unzip’ tool while preserving enough information to spread its contents out onto their final paths at any later time. |
| 49 | +
|
| 50 | +This means that ZIP ambiguities are unlikely to be caught by installers, as there are no |
| 51 | +restrictions for which ZIP features are allowed in a valid wheel archive. |
| 52 | + |
| 53 | +There's also a Python packaging specific mechanism for which files are meant to be included |
| 54 | +in a wheel. The `RECORD` file included inside wheel `.dist-info` directories |
| 55 | +lists files by name and optionally a checksum (like SHA256). |
| 56 | +The [specification for the `.dist-info` directory](https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-directory) |
| 57 | +details how installers are supposed to check the contents of the ZIP archive against `RECORD`: |
| 58 | + |
| 59 | +> Apart from `RECORD` and its signatures, installation will fail if any file in the archive is not both mentioned and correctly hashed in `RECORD`. |
| 60 | +
|
| 61 | +However, most Python installers today do not do this check and extract the contents |
| 62 | +of the ZIP archive similar to `unzip` and then amend the installed `RECORD` within the |
| 63 | +virtual environment so that uninstalling the package works as expected. |
| 64 | + |
| 65 | +This means that there is no forcing function on Python projects and |
| 66 | +packaging tools to follow packaging standards or normalize their use of ZIP archive features. |
| 67 | +This leads to the ambiguous situation today where no one installer can start |
| 68 | +enforcing standards without accidentally "breaking" projects and archives |
| 69 | +that already exist on PyPI. |
| 70 | + |
| 71 | +PyPI is adopting a few measures to prevent attackers from abusing the complexities |
| 72 | +of ZIP archives and installers not checking `RECORD` files to smuggle files past |
| 73 | +manual review processes and automated detection tools. |
| 74 | + |
| 75 | +## What is PyPI doing to prevent ZIP confusion attacks? |
| 76 | + |
| 77 | +The correct method to unpack a ZIP is to first check the Central Directory |
| 78 | +of files before extracting entries. See this [blog post](https://www.crowdstrike.com/en-us/blog/how-to-prevent-zip-file-exploitation/) |
| 79 | +for a more detailed explanation of ZIP confusion attacks. |
| 80 | + |
| 81 | +PyPI is implementing the following logic to prevent ZIP confusion attacks on |
| 82 | +the upload of wheels and ZIPs: |
| 83 | + |
| 84 | +* Rejecting ZIP archives with invalid record and framing information. |
| 85 | +* Rejecting ZIP archives with duplicate filenames in Local File and Central Directory headers. |
| 86 | +* Rejecting ZIP archives where files included in Local File and Central Directory headers don't match. |
| 87 | +* Rejecting ZIP archives with trailing data or multiple End of Central Directory headers. |
| 88 | +* Rejecting ZIP archives with incorrect End of Central Directory Locator values. |
| 89 | + |
| 90 | +PyPI already implements ZIP and tarball compression-bomb detection |
| 91 | +as a part of upload processing. |
| 92 | + |
| 93 | +PyPI will also begin sending emails to **warn users when wheels are published |
| 94 | +whose ZIP contents don't match the included `RECORD` metadata file**. After 6 months of warnings, |
| 95 | +on February 1st, 2026, PyPI will begin **rejecting** newly uploaded wheels whose ZIP contents |
| 96 | +don't match the included `RECORD` metadata file. |
| 97 | + |
| 98 | +We encourage all Python installers to use this opportunity to |
| 99 | +implement cross-checking of extracted wheel contents with the `RECORD` metadata file. |
| 100 | + |
| 101 | +## `RECORD` and ZIP issues in top Python packages |
| 102 | + |
| 103 | +Almost all the top 15,000 Python packages by downloads (of which 13,468 publish wheels) |
| 104 | +have no issues with the ZIP format or the `RECORD` metadata file. |
| 105 | +This makes us confident that we can deploy |
| 106 | +these changes without major disruption of existing Python project |
| 107 | +development. |
| 108 | + |
| 109 | +| Status | Number of Projects | |
| 110 | +|-------------------------------------|--------------------| |
| 111 | +| No `RECORD` or ZIP issues | 13,460 | |
| 112 | +| Missing file from `RECORD` | 4 | |
| 113 | +| Mismatched `RECORD` and ZIP headers | 2 | |
| 114 | +| Duplicate files in ZIP headers | 2 | |
| 115 | +| Other ZIP format issues | 0 | |
| 116 | + |
| 117 | +Note that there are more occurrences of ZIP and `RECORD` issues |
| 118 | +that have been reported for other projects on PyPI, but those projects |
| 119 | +are not in the top 15,000 by downloads. |
| 120 | + |
| 121 | +## What actions should I take? |
| 122 | + |
| 123 | +The mitigations above mean that |
| 124 | +users of PyPI, regardless of their installer, don't need to take immediate action |
| 125 | +to be safe. We recommend the following actions to users of PyPI to ensure |
| 126 | +compliance with Python package and ZIP standards: |
| 127 | + |
| 128 | +* **For users installing PyPI projects**: Make sure your installer tools are up-to-date. |
| 129 | +* **For maintainers of PyPI projects**: If you encounter an error during upload, |
| 130 | + read the error message and update your own build process or report the issue |
| 131 | + to your build tool, if applicable. |
| 132 | +* **For maintainers of installer projects**: Ensure that your ZIP implementation follows the ZIP standard |
| 133 | + and checks the Central Directory before proceeding with decompression. |
| 134 | + See the CPython `zipfile` module for a ZIP implementation that implements this |
| 135 | + logic. Begin checking the `RECORD` file against ZIP contents and erroring |
| 136 | + or warning the user that the wheel is incorrectly formatted. |
| 137 | + |
| 138 | +## Acknowledgements |
| 139 | + |
| 140 | +Thanks to Caleb Brown (Google Open Source Security Team) and Tim Hatch (Netflix) for reporting this issue. |
| 141 | + |
| 142 | +This level of coordination across Python ecosystem projects requires significant |
| 143 | +engineering time investment. Thanks to [Alpha-Omega](https://alpha-omega.dev) who sponsors the security-focused |
| 144 | +[Developer-in-Residence](https://www.python.org/psf/developersinresidence/) positions at the Python Software Foundation. |
0 commit comments