@@ -2,97 +2,112 @@ PEP: 784
2
2
Title: Adding Zstandard to the standard library
3
3
Author: Emma Harper Smith <
[email protected] >
4
4
Sponsor: Gregory P. Smith <
[email protected] >
5
+ Discussions-To: https://discuss.python.org/t/87377
5
6
Status: Draft
6
7
Type: Standards Track
7
8
Created: 06-Apr-2025
8
9
Python-Version: 3.14
10
+ Post-History:
11
+ `07-Apr-2025 <https://discuss.python.org/t/87377 >`__,
12
+
9
13
10
14
Abstract
11
15
========
12
16
13
- `Zstandard <https://facebook.github.io/zstd/ >`_ is a widely adopted, mature,
14
- and highly efficient compression standard. This PEP proposes adding a new
15
- module to the Python standard library containing a Python wrapper around Meta's
16
- ``zstd `` library, the default implementation. Additionally, to avoid name
17
- collisions with packages on PyPI and to present a unified interface to Python
18
- users, compression modules in the standard library will be moved under a
19
- ``compression.* `` namespace package.
17
+ `Zstandard `_ is a widely adopted, mature, and highly efficient compression
18
+ standard. This PEP proposes adding a new module to the Python standard library
19
+ containing a Python wrapper around Meta's |zstd | library, the default
20
+ implementation. Additionally, to avoid name collisions with packages on PyPI
21
+ and to present a unified interface to Python users, compression modules in the
22
+ standard library will be moved under a ``compression.* `` package.
23
+
24
+ .. |zstd | replace :: ``zstd ``
25
+ .. _zstd : https://facebook.github.io/zstd/
26
+ .. _Zstandard : https://facebook.github.io/zstd/
27
+
20
28
21
29
Motivation
22
30
==========
23
31
24
- CPython has modules for several different compression formats, such as `zlib
25
- (DEFLATE) <https://docs.python.org/3/library/zlib.html> `_,
26
- `bzip2 <https://docs.python.org/3/library/bz2.html >`_,
27
- and `lzma <https://docs.python.org/3/library/lzma.html >`_, each widely used.
28
- Including popular compression algorithms matches Python's "batteries included"
29
- philosophy of incorporating widely useful standards and utilities. The last
30
- compression module added to the language was ``lzma ``, added in Python 3.3.
32
+ CPython has modules for several different compression formats, such as
33
+ :mod: `zlib (DEFLATE) <zlib> `, :mod: `bzip2 <bz2> `, and :mod: `lzma <lzma> `,
34
+ each widely used. Including popular compression algorithms matches Python's
35
+ "batteries included" philosophy of incorporating widely useful standards and
36
+ utilities. :mod: `!lzma ` is the most recent such module, added in Python 3.3.
31
37
32
- Since then, Zstandard has become the modern de facto preferred compression
38
+ Since then, Zstandard has become the modern * de facto * preferred compression
33
39
library for both high performance compression and decompression attaining high
34
40
compression ratios at reasonable CPU and memory cost. Zstandard achieves a much
35
41
higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing
36
42
significantly faster than LZMA.
37
43
38
- Zstandard has seen `widespread adoption in many different areas of computing
39
- <https://facebook.github.io/zstd/#references> `_. The numerous hardware
40
- implementations demonstrate long-term commitment to Zstandard and an
41
- expectation that Zstandard will stay the de facto choice for compression for
42
- years to come. Zstandard compression is also implemented in both the ZFS and
43
- Btrfs filesystems.
44
+ Zstandard has seen `widespread adoption `_ in many different areas of computing.
45
+ The numerous hardware implementations demonstrate long-term commitment to
46
+ Zstandard and an expectation that Zstandard will stay the *de facto * choice for
47
+ compression for years to come. Zstandard compression is also implemented in
48
+ both the ZFS _ and Btrfs _ filesystems.
44
49
45
50
Zstandard's highly efficient compression has supplanted other modern
46
- compression formats, such as brotli, lzo , and ucl due to its highly efficient
47
- compression. While `LZ4 < https://lz4.org/ > `_ is still used in very high
48
- throughput scenarios, Zstandard can also be used in some of these contexts.
51
+ compression formats, such as brotli _, lzo _ , and ucl _ due to its highly
52
+ efficient compression. While `LZ4 `_ is still used in very high throughput
53
+ scenarios, Zstandard can also be used in some of these contexts.
49
54
While inclusion of LZ4 is out of scope, it would be a compelling future
50
55
addition to the ``compression `` namespace introduced by this PEP.
51
56
52
57
There are several bindings to Zstandard for Python available on PyPI, each with
53
58
different APIs and choices of how to bind the ``zstd `` library. One goal with
54
59
introducing an official module in the standard library is to reduce confusion
55
60
for Python users who want simple compression/decompression APIs for Zstandard.
56
- The existing packages can continue providing extended APIs and bindings for
57
- other Python implementations such as PyPy or integrate features from newer
58
- Zstandard versions.
61
+ The existing packages can continue providing extended APIs or integrate
62
+ features from newer Zstandard versions.
59
63
60
64
Another reason to add Zstandard support to the standard library is to resolve
61
- a long standing `open issue <https://github.com/python/cpython/issues/81276 >`_
62
- requesting Zstandard support in the ``tarfile `` module. This issue has the 5th
63
- most "thumbs up" of open issues on the CPython tracker, and has garnered a
64
- significant amount of discussion and interest. Additionally, the `ZIP format
65
- standardizes a Zstandard compression format ID
66
- <https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT> `_,
67
- and integration with ``zipfile `` would allow opening ZIP archives using
68
- Zstandard compression. The reference implementation for this PEP contains
69
- integration with the ``zipfile ``, ``tarfile ``, and ``shutil `` modules.
65
+ a long standing open issue (`python/cpython#81276 `_) requesting Zstandard
66
+ support in the :mod: `tarfile ` module. This issue has the 5th most "thumbs up"
67
+ of open issues on the CPython tracker, and has garnered a significant amount of
68
+ discussion and interest. Additionally, the ZIP format standardizes a
69
+ `Zstandard compression format ID `_, and integration with the :mod: `zipfile `
70
+ module would allow opening ZIP archives using Zstandard compression. The
71
+ reference implementation for this PEP contains integration with the
72
+ :mod: `!zipfile `, :mod: `!tarfile `, and :mod: `shutil ` modules.
70
73
71
74
Zstandard compression could also be used to make Python wheel packages smaller
72
75
and significantly faster to install. Anaconda found a sizeable speedup when
73
- adopting Zstandard for the conda package format
76
+ adopting Zstandard for the conda package format:
74
77
75
78
.. epigraph ::
76
79
77
80
Conda's download sizes are reduced ~30-40%, and extraction is dramatically faster.
78
81
[...]
79
82
We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format.
80
83
81
- -- `Anaconda blog on Zstandard adoption < https://www.anaconda.com/blog/how-we-made-conda-faster-4-7 > `_
84
+ -- `Anaconda blog on Zstandard adoption `_
82
85
83
- `According to lzbench <https://github.com/inikep/lzbench?tab=readme-ov-file#benchmarks >`_,
84
- a comprehensive benchmark of many different compression libraries and formats,
85
86
Zstandard has a significantly higher compression ratio compared to wheel's
86
- existing zlib-based compression. While this PEP does *not * prescribe any
87
- changes to the wheel format or other packaging standards, having Zstandard
88
- bindings in the standard library would enable a future PEP to improve the user
89
- experience for Python wheel packages.
87
+ existing zlib-based compression, `according to lzbench `_, a comprehensive
88
+ benchmark of many different compression libraries and formats.
89
+ While this PEP does *not * prescribe any changes to the wheel format or other
90
+ packaging standards, having Zstandard bindings in the standard library would
91
+ enable a future PEP to improve the user experience for Python wheel packages.
92
+
93
+ .. _widespread adoption : https://facebook.github.io/zstd/#references
94
+ .. _ZFS : https://en.wikipedia.org/wiki/ZFS
95
+ .. _Btrfs : https://btrfs.readthedocs.io/
96
+ .. _brotli : https://brotli.org/
97
+ .. _lzo : https://www.oberhumer.com/opensource/lzo/
98
+ .. _ucl : https://www.oberhumer.com/opensource/ucl/
99
+ .. _LZ4 : https://lz4.org/
100
+ .. _python/cpython#81276 : https://github.com/python/cpython/issues/81276
101
+ .. _Zstandard compression format ID : https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT
102
+ .. _according to lzbench : https://github.com/inikep/lzbench#benchmarks
103
+ .. _Anaconda blog on Zstandard adoption : https://www.anaconda.com/blog/how-we-made-conda-faster-4-7
104
+
90
105
91
106
Rationale
92
107
=========
93
108
94
- Introduction of a ``compression `` namespace
95
- -------------------------------------------
109
+ Introduction of a ``compression `` package
110
+ -----------------------------------------
96
111
97
112
Both the ``zstd `` and ``zstandard `` import names are claimed by projects on
98
113
PyPI. To avoid breaking users of one of the existing bindings, this PEP
@@ -130,13 +145,17 @@ name otherwise.
130
145
Implementation based on ``pyzstd ``
131
146
----------------------------------
132
147
133
- The implementation for this PEP is based on the `pyzstd project < https://github.com/Rogdham/pyzstd > `_.
134
- This project was chosen as the code was `originally written to be upstreamed < https://github.com/python/cpython/issues/81276#issuecomment-1093824963 > `_
135
- to CPython by Ma Lin, who also wrote the `output buffer implementation used in
136
- the standard library today <https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231> `_ .
148
+ The implementation for this PEP is based on the `pyzstd project `_.
149
+ This project was chosen as the code was `originally written to be upstreamed `_
150
+ to CPython by Ma Lin, who also wrote the `output buffer implementation `_ used in
151
+ the standard library today.
137
152
The project has since been taken over by Rogdham and is published to PyPI. The
138
153
APIs in ``pyzstd `` are similar to the APIs for other compression modules in the
139
- standard library such as ``bz2 `` and ``lzma ``.
154
+ standard library such as :mod: `!bz2 ` and :mod: `!lzma `.
155
+
156
+ .. _pyzstd project : https://github.com/Rogdham/pyzstd
157
+ .. _originally written to be upstreamed : https://github.com/python/cpython/issues/81276#issuecomment-1093824963
158
+ .. _output buffer implementation : https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231
140
159
141
160
Minimum supported Zstandard version
142
161
-----------------------------------
@@ -149,13 +168,14 @@ compatibility with existing LTS Linux distributions, but a newer Zstandard
149
168
version could likely be chosen given that newer Python releases are generally
150
169
packaged as part of newer distribution releases.
151
170
171
+
152
172
Specification
153
173
=============
154
174
155
175
The ``compression `` namespace
156
176
-----------------------------
157
177
158
- A new namespace package for compression modules will be introduced named
178
+ A new namespace for compression modules will be introduced named
159
179
``compression ``. The top-level module for this package will be empty to begin
160
180
with, but a standard API for interacting with compression routines may be
161
181
added in the future to the toplevel.
@@ -167,17 +187,18 @@ A new module, ``compression.zstd`` will be introduced with Zstandard
167
187
compression APIs that match other compression modules in the standard library,
168
188
namely
169
189
170
- * ``compress `` / ``decompress `` - APIs for one-shot compression/decompression
171
- * ``ZstdFile `` / ``open `` - APIs for interacting with streams and file-like
172
- objects
173
- * ``ZstdCompressor `` / ``ZstdDecompressor `` - APIs for incremental compression/
174
- decompression
190
+ * :func: `!compress ` / :func: `!decompress ` - APIs for one-shot compression
191
+ or decompression
192
+ * :class: `!ZstdFile ` / :func: `!open ` - APIs for interacting with streams
193
+ and file-like objects
194
+ * :class: `!ZstdCompressor ` / :class: `!ZstdDecompressor ` - APIs for incremental
195
+ compression or decompression
175
196
176
- It will also contain some Zstandard-specific functionality
197
+ It will also contain some Zstandard-specific functionality:
177
198
178
- * `` ZstdDict `` / `` train_dict `` / `` finalize_dict `` - APIs for interacting with
179
- Zstandard dictionaries, which are useful for compressing many small chunks of
180
- similar data
199
+ * :class: ` ! ZstdDict ` / :func: ` ! train_dict ` / :func: ` ! finalize_dict ` - APIs for
200
+ interacting with Zstandard dictionaries, which are useful for compressing
201
+ many small chunks of similar data
181
202
182
203
``libzstd `` optional dependency
183
204
-------------------------------
@@ -222,11 +243,12 @@ Backwards Compatibility
222
243
223
244
The main compatibility concern is usage of existing standard library
224
245
compression APIs with the existing import names. These names will be
225
- deprecated in 3.19 and will be removed in 3.24. Given the long coexistance of
246
+ deprecated in 3.19 and will be removed in 3.24. Given the long coexistence of
226
247
the modules and a 5 year deprecation period, most users will likely migrate to
227
248
the new import names well before then. Additionally, a libCST codemod can be
228
249
provided to automatically rewrite imports, easing the migration.
229
250
251
+
230
252
Security Implications
231
253
=====================
232
254
@@ -241,13 +263,15 @@ Taking on a new dependency also always has security risks, but the ``zstd``
241
263
library is mature, fuzzed on each commit, and `participates in Meta's bug bounty
242
264
program <https://github.com/facebook/zstd/blob/dev/SECURITY.md> `_.
243
265
266
+
244
267
How to Teach This
245
268
=================
246
269
247
270
Documentation for the new module is in the reference implementation branch. The
248
271
documentation for other modules will be updated to discuss the deprecation of
249
272
their existing import names, and how to migrate.
250
273
274
+
251
275
Reference Implementation
252
276
========================
253
277
@@ -258,6 +282,7 @@ integration added. It also contains the re-exports of other compression
258
282
modules. Deprecations for the existing import names will be added once a
259
283
decision is reached regarding the open issues.
260
284
285
+
261
286
Rejected Ideas
262
287
==============
263
288
@@ -273,6 +298,7 @@ import name ``lz4``. Instead of solving this issue for each compression format,
273
298
it is better to solve it once and for all by using the already-claimed
274
299
``compression `` namespace.
275
300
301
+
276
302
Copyright
277
303
=========
278
304
0 commit comments