Skip to content

Commit ded32dd

Browse files
authored
PEP 784: Editorial review (#4350)
1 parent 498c5b7 commit ded32dd

File tree

1 file changed

+88
-62
lines changed

1 file changed

+88
-62
lines changed

peps/pep-0784.rst

Lines changed: 88 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -2,97 +2,112 @@ PEP: 784
22
Title: Adding Zstandard to the standard library
33
Author: Emma Harper Smith <[email protected]>
44
Sponsor: Gregory P. Smith <[email protected]>
5+
Discussions-To: https://discuss.python.org/t/87377
56
Status: Draft
67
Type: Standards Track
78
Created: 06-Apr-2025
89
Python-Version: 3.14
10+
Post-History:
11+
`07-Apr-2025 <https://discuss.python.org/t/87377>`__,
12+
913

1014
Abstract
1115
========
1216

13-
`Zstandard <https://facebook.github.io/zstd/>`_ is a widely adopted, mature,
14-
and highly efficient compression standard. This PEP proposes adding a new
15-
module to the Python standard library containing a Python wrapper around Meta's
16-
``zstd`` library, the default implementation. Additionally, to avoid name
17-
collisions with packages on PyPI and to present a unified interface to Python
18-
users, compression modules in the standard library will be moved under a
19-
``compression.*`` namespace package.
17+
`Zstandard`_ is a widely adopted, mature, and highly efficient compression
18+
standard. This PEP proposes adding a new module to the Python standard library
19+
containing a Python wrapper around Meta's |zstd| library, the default
20+
implementation. Additionally, to avoid name collisions with packages on PyPI
21+
and to present a unified interface to Python users, compression modules in the
22+
standard library will be moved under a ``compression.*`` package.
23+
24+
.. |zstd| replace:: ``zstd``
25+
.. _zstd: https://facebook.github.io/zstd/
26+
.. _Zstandard: https://facebook.github.io/zstd/
27+
2028

2129
Motivation
2230
==========
2331

24-
CPython has modules for several different compression formats, such as `zlib
25-
(DEFLATE) <https://docs.python.org/3/library/zlib.html>`_,
26-
`bzip2 <https://docs.python.org/3/library/bz2.html>`_,
27-
and `lzma <https://docs.python.org/3/library/lzma.html>`_, each widely used.
28-
Including popular compression algorithms matches Python's "batteries included"
29-
philosophy of incorporating widely useful standards and utilities. The last
30-
compression module added to the language was ``lzma``, added in Python 3.3.
32+
CPython has modules for several different compression formats, such as
33+
:mod:`zlib (DEFLATE) <zlib>`, :mod:`bzip2 <bz2>`, and :mod:`lzma <lzma>`,
34+
each widely used. Including popular compression algorithms matches Python's
35+
"batteries included" philosophy of incorporating widely useful standards and
36+
utilities. :mod:`!lzma` is the most recent such module, added in Python 3.3.
3137

32-
Since then, Zstandard has become the modern de facto preferred compression
38+
Since then, Zstandard has become the modern *de facto* preferred compression
3339
library for both high performance compression and decompression attaining high
3440
compression ratios at reasonable CPU and memory cost. Zstandard achieves a much
3541
higher compression ratio than bzip2 or zlib (DEFLATE) while decompressing
3642
significantly faster than LZMA.
3743

38-
Zstandard has seen `widespread adoption in many different areas of computing
39-
<https://facebook.github.io/zstd/#references>`_. The numerous hardware
40-
implementations demonstrate long-term commitment to Zstandard and an
41-
expectation that Zstandard will stay the de facto choice for compression for
42-
years to come. Zstandard compression is also implemented in both the ZFS and
43-
Btrfs filesystems.
44+
Zstandard has seen `widespread adoption`_ in many different areas of computing.
45+
The numerous hardware implementations demonstrate long-term commitment to
46+
Zstandard and an expectation that Zstandard will stay the *de facto* choice for
47+
compression for years to come. Zstandard compression is also implemented in
48+
both the ZFS_ and Btrfs_ filesystems.
4449

4550
Zstandard's highly efficient compression has supplanted other modern
46-
compression formats, such as brotli, lzo, and ucl due to its highly efficient
47-
compression. While `LZ4 <https://lz4.org/>`_ is still used in very high
48-
throughput scenarios, Zstandard can also be used in some of these contexts.
51+
compression formats, such as brotli_, lzo_, and ucl_ due to its highly
52+
efficient compression. While `LZ4`_ is still used in very high throughput
53+
scenarios, Zstandard can also be used in some of these contexts.
4954
While inclusion of LZ4 is out of scope, it would be a compelling future
5055
addition to the ``compression`` namespace introduced by this PEP.
5156

5257
There are several bindings to Zstandard for Python available on PyPI, each with
5358
different APIs and choices of how to bind the ``zstd`` library. One goal with
5459
introducing an official module in the standard library is to reduce confusion
5560
for Python users who want simple compression/decompression APIs for Zstandard.
56-
The existing packages can continue providing extended APIs and bindings for
57-
other Python implementations such as PyPy or integrate features from newer
58-
Zstandard versions.
61+
The existing packages can continue providing extended APIs or integrate
62+
features from newer Zstandard versions.
5963

6064
Another reason to add Zstandard support to the standard library is to resolve
61-
a long standing `open issue <https://github.com/python/cpython/issues/81276>`_
62-
requesting Zstandard support in the ``tarfile`` module. This issue has the 5th
63-
most "thumbs up" of open issues on the CPython tracker, and has garnered a
64-
significant amount of discussion and interest. Additionally, the `ZIP format
65-
standardizes a Zstandard compression format ID
66-
<https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT>`_,
67-
and integration with ``zipfile`` would allow opening ZIP archives using
68-
Zstandard compression. The reference implementation for this PEP contains
69-
integration with the ``zipfile``, ``tarfile``, and ``shutil`` modules.
65+
a long standing open issue (`python/cpython#81276`_) requesting Zstandard
66+
support in the :mod:`tarfile` module. This issue has the 5th most "thumbs up"
67+
of open issues on the CPython tracker, and has garnered a significant amount of
68+
discussion and interest. Additionally, the ZIP format standardizes a
69+
`Zstandard compression format ID`_, and integration with the :mod:`zipfile`
70+
module would allow opening ZIP archives using Zstandard compression. The
71+
reference implementation for this PEP contains integration with the
72+
:mod:`!zipfile`, :mod:`!tarfile`, and :mod:`shutil` modules.
7073

7174
Zstandard compression could also be used to make Python wheel packages smaller
7275
and significantly faster to install. Anaconda found a sizeable speedup when
73-
adopting Zstandard for the conda package format
76+
adopting Zstandard for the conda package format:
7477

7578
.. epigraph::
7679

7780
Conda's download sizes are reduced ~30-40%, and extraction is dramatically faster.
7881
[...]
7982
We see approximately a 2.5x overall speedup, almost all thanks to the dramatically faster extraction speed of the zstd compression used in the new file format.
8083

81-
-- `Anaconda blog on Zstandard adoption <https://www.anaconda.com/blog/how-we-made-conda-faster-4-7>`_
84+
-- `Anaconda blog on Zstandard adoption`_
8285

83-
`According to lzbench <https://github.com/inikep/lzbench?tab=readme-ov-file#benchmarks>`_,
84-
a comprehensive benchmark of many different compression libraries and formats,
8586
Zstandard has a significantly higher compression ratio compared to wheel's
86-
existing zlib-based compression. While this PEP does *not* prescribe any
87-
changes to the wheel format or other packaging standards, having Zstandard
88-
bindings in the standard library would enable a future PEP to improve the user
89-
experience for Python wheel packages.
87+
existing zlib-based compression, `according to lzbench`_, a comprehensive
88+
benchmark of many different compression libraries and formats.
89+
While this PEP does *not* prescribe any changes to the wheel format or other
90+
packaging standards, having Zstandard bindings in the standard library would
91+
enable a future PEP to improve the user experience for Python wheel packages.
92+
93+
.. _widespread adoption: https://facebook.github.io/zstd/#references
94+
.. _ZFS: https://en.wikipedia.org/wiki/ZFS
95+
.. _Btrfs: https://btrfs.readthedocs.io/
96+
.. _brotli: https://brotli.org/
97+
.. _lzo: https://www.oberhumer.com/opensource/lzo/
98+
.. _ucl: https://www.oberhumer.com/opensource/ucl/
99+
.. _LZ4: https://lz4.org/
100+
.. _python/cpython#81276: https://github.com/python/cpython/issues/81276
101+
.. _Zstandard compression format ID: https://pkwaredownloads.blob.core.windows.net/pkware-general/Documentation/APPNOTE-6.3.8.TXT
102+
.. _according to lzbench: https://github.com/inikep/lzbench#benchmarks
103+
.. _Anaconda blog on Zstandard adoption: https://www.anaconda.com/blog/how-we-made-conda-faster-4-7
104+
90105

91106
Rationale
92107
=========
93108

94-
Introduction of a ``compression`` namespace
95-
-------------------------------------------
109+
Introduction of a ``compression`` package
110+
-----------------------------------------
96111

97112
Both the ``zstd`` and ``zstandard`` import names are claimed by projects on
98113
PyPI. To avoid breaking users of one of the existing bindings, this PEP
@@ -130,13 +145,17 @@ name otherwise.
130145
Implementation based on ``pyzstd``
131146
----------------------------------
132147

133-
The implementation for this PEP is based on the `pyzstd project <https://github.com/Rogdham/pyzstd>`_.
134-
This project was chosen as the code was `originally written to be upstreamed <https://github.com/python/cpython/issues/81276#issuecomment-1093824963>`_
135-
to CPython by Ma Lin, who also wrote the `output buffer implementation used in
136-
the standard library today <https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231>`_.
148+
The implementation for this PEP is based on the `pyzstd project`_.
149+
This project was chosen as the code was `originally written to be upstreamed`_
150+
to CPython by Ma Lin, who also wrote the `output buffer implementation`_ used in
151+
the standard library today.
137152
The project has since been taken over by Rogdham and is published to PyPI. The
138153
APIs in ``pyzstd`` are similar to the APIs for other compression modules in the
139-
standard library such as ``bz2`` and ``lzma``.
154+
standard library such as :mod:`!bz2` and :mod:`!lzma`.
155+
156+
.. _pyzstd project: https://github.com/Rogdham/pyzstd
157+
.. _originally written to be upstreamed: https://github.com/python/cpython/issues/81276#issuecomment-1093824963
158+
.. _output buffer implementation: https://github.com/python/cpython/commit/f9bedb630e8a0b7d94e1c7e609b20dfaa2b22231
140159

141160
Minimum supported Zstandard version
142161
-----------------------------------
@@ -149,13 +168,14 @@ compatibility with existing LTS Linux distributions, but a newer Zstandard
149168
version could likely be chosen given that newer Python releases are generally
150169
packaged as part of newer distribution releases.
151170

171+
152172
Specification
153173
=============
154174

155175
The ``compression`` namespace
156176
-----------------------------
157177

158-
A new namespace package for compression modules will be introduced named
178+
A new namespace for compression modules will be introduced named
159179
``compression``. The top-level module for this package will be empty to begin
160180
with, but a standard API for interacting with compression routines may be
161181
added in the future to the toplevel.
@@ -167,17 +187,18 @@ A new module, ``compression.zstd`` will be introduced with Zstandard
167187
compression APIs that match other compression modules in the standard library,
168188
namely
169189

170-
* ``compress`` / ``decompress`` - APIs for one-shot compression/decompression
171-
* ``ZstdFile`` / ``open`` - APIs for interacting with streams and file-like
172-
objects
173-
* ``ZstdCompressor`` / ``ZstdDecompressor`` - APIs for incremental compression/
174-
decompression
190+
* :func:`!compress` / :func:`!decompress` - APIs for one-shot compression
191+
or decompression
192+
* :class:`!ZstdFile` / :func:`!open` - APIs for interacting with streams
193+
and file-like objects
194+
* :class:`!ZstdCompressor` / :class:`!ZstdDecompressor` - APIs for incremental
195+
compression or decompression
175196

176-
It will also contain some Zstandard-specific functionality
197+
It will also contain some Zstandard-specific functionality:
177198

178-
* ``ZstdDict`` / ``train_dict`` / ``finalize_dict`` - APIs for interacting with
179-
Zstandard dictionaries, which are useful for compressing many small chunks of
180-
similar data
199+
* :class:`!ZstdDict` / :func:`!train_dict` / :func:`!finalize_dict` - APIs for
200+
interacting with Zstandard dictionaries, which are useful for compressing
201+
many small chunks of similar data
181202

182203
``libzstd`` optional dependency
183204
-------------------------------
@@ -222,11 +243,12 @@ Backwards Compatibility
222243

223244
The main compatibility concern is usage of existing standard library
224245
compression APIs with the existing import names. These names will be
225-
deprecated in 3.19 and will be removed in 3.24. Given the long coexistance of
246+
deprecated in 3.19 and will be removed in 3.24. Given the long coexistence of
226247
the modules and a 5 year deprecation period, most users will likely migrate to
227248
the new import names well before then. Additionally, a libCST codemod can be
228249
provided to automatically rewrite imports, easing the migration.
229250

251+
230252
Security Implications
231253
=====================
232254

@@ -241,13 +263,15 @@ Taking on a new dependency also always has security risks, but the ``zstd``
241263
library is mature, fuzzed on each commit, and `participates in Meta's bug bounty
242264
program <https://github.com/facebook/zstd/blob/dev/SECURITY.md>`_.
243265

266+
244267
How to Teach This
245268
=================
246269

247270
Documentation for the new module is in the reference implementation branch. The
248271
documentation for other modules will be updated to discuss the deprecation of
249272
their existing import names, and how to migrate.
250273

274+
251275
Reference Implementation
252276
========================
253277

@@ -258,6 +282,7 @@ integration added. It also contains the re-exports of other compression
258282
modules. Deprecations for the existing import names will be added once a
259283
decision is reached regarding the open issues.
260284

285+
261286
Rejected Ideas
262287
==============
263288

@@ -273,6 +298,7 @@ import name ``lz4``. Instead of solving this issue for each compression format,
273298
it is better to solve it once and for all by using the already-claimed
274299
``compression`` namespace.
275300

301+
276302
Copyright
277303
=========
278304

0 commit comments

Comments
 (0)