Skip to content

Commit f7d669e

Browse files
authored
PEP 784: Updates to reflect inclusion of gzip and discussed rejected ideas (#4375)
1 parent 5b8e28b commit f7d669e

File tree

1 file changed

+80
-15
lines changed

1 file changed

+80
-15
lines changed

peps/pep-0784.rst

Lines changed: 80 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,11 @@ Motivation
3030
==========
3131

3232
CPython has modules for several different compression formats, such as
33-
:mod:`zlib (DEFLATE) <zlib>`, :mod:`bzip2 <bz2>`, and :mod:`lzma <lzma>`,
34-
each widely used. Including popular compression algorithms matches Python's
35-
"batteries included" philosophy of incorporating widely useful standards and
36-
utilities. :mod:`!lzma` is the most recent such module, added in Python 3.3.
33+
:mod:`zlib (DEFLATE) <zlib>`, :mod:`gzip <gzip>`, :mod:`bzip2 <bz2>`, and
34+
:mod:`lzma <lzma>`, each widely used. Including popular compression algorithms
35+
matches Python's "batteries included" philosophy of incorporating widely useful
36+
standards and utilities. :mod:`!lzma` is the most recent such module, added in
37+
Python 3.3.
3738

3839
Since then, Zstandard has become the modern *de facto* preferred compression
3940
library for both high performance compression and decompression attaining high
@@ -216,9 +217,10 @@ used to build libraries CPython depends on for Windows.
216217
Other compression modules
217218
-------------------------
218219

219-
New import names ``compression.lzma``, ``compression.bz2``, and
220-
``compression.zlib`` will be introduced in Python 3.14 re-exporting the
221-
contents of the existing ``lzma``, ``bz2``, and ``zlib`` modules respectively.
220+
New import names ``compression.lzma``, ``compression.bz2``,
221+
``compression.gzip`` and ``compression.zlib`` will be introduced in Python 3.14
222+
re-exporting the contents of the existing ``lzma``, ``bz2``, ``gzip`` and
223+
``zlib`` modules respectively.
222224

223225
The ``_compression`` module, given that it is marked private, will be
224226
immediately renamed to ``compression._common.streams``. The new name was
@@ -289,17 +291,80 @@ decision is reached regarding the open issues.
289291
Rejected Ideas
290292
==============
291293

292-
Name the module ``libzstd`` and do not make a new ``compression`` namespace
294+
Name the module ``zstdlib`` and do not make a new ``compression`` namespace
293295
---------------------------------------------------------------------------
294296

295297
One option instead of making a new ``compression`` namespace would be to find
296-
a different name, such as ``libzstd``, as the import name. However, the issue
297-
of existing import names is likely to persist for future compression formats
298-
added to the standard library. LZ4, a common high speed compression format,
299-
has `a package on PyPI <https://pypi.org/project/lz4/>`_, ``lz4``, with the
300-
import name ``lz4``. Instead of solving this issue for each compression format,
301-
it is better to solve it once and for all by using the already-claimed
302-
``compression`` namespace.
298+
a different name, such as ``zstdlib``, as the import name. Several other names,
299+
such as ``zst``, ``libzstd``, and ``zstdcomp`` were proposed as well. In
300+
discussion, the names were found to either be too easy to typo, or unintuitive.
301+
Furthermore, the issue of existing import names is likely to persist for future
302+
compression formats added to the standard library. LZ4, a common high speed
303+
compression format, has `a package on PyPI <https://pypi.org/project/lz4/>`_,
304+
``lz4``, with the import name ``lz4``. Instead of solving this issue for each
305+
compression format, it is better to solve it once and for all by using the
306+
already-claimed ``compression`` namespace.
307+
308+
Introduce an experimental ``_zstd`` package in Python 3.14
309+
----------------------------------------------------------
310+
311+
Since this PEP was published close to the beta cutoff for new features for
312+
Python 3.14, one proposal was to name the package a private module ``_zstd``
313+
so that packaging tools could use it sooner, but not deciding on a name. This
314+
would allow more time for discussion of the final module name during the 3.15
315+
development window. However, introducing a private module was not popular. The
316+
expectations and contract for external usage of a private module in the
317+
standard library are unclear.
318+
319+
Introduce a standard library namespace instead of ``compression``
320+
-----------------------------------------------------------------
321+
322+
One alternative to a ``compression`` namespace would be to introduce a
323+
``std`` namespace for the entire standard library. However, this was seen as
324+
too significant a change for 3.14, with no agreed upon semantics, migration
325+
path, or name for the package. Furthermore, a future PEP introducing a ``std``
326+
namespace could always define that the ``compression`` sub-modules be flattened
327+
into the ``std`` namespace.
328+
329+
Include ``zipfile`` and ``tarfile`` in ``compression``
330+
------------------------------------------------------
331+
332+
Compression is often used with archiving tools, so putting both :mod:`zipfile`
333+
and :mod:`tarfile` under the ``compression`` namespace is appealing. However,
334+
compression can be used beyond just archiving tools. For example, network
335+
requests can be gzip compressed. Furthermore, formats like tar do not include
336+
compression themselves, instead relying on external compression. Therefore,
337+
this PEP does not propose moving :mod:`!zipfile` or :mod:`!tarfile` under
338+
``compression``.
339+
340+
Do not include ``gzip`` under ``compression``
341+
---------------------------------------------
342+
343+
The :rfc:`GZip format RFC <1952>` defines a format which can include multiple
344+
blocks and metadata about its contents. In this way GZip is rather similar to
345+
archive formats like ZIP and tar. Despite that, in usage GZip is often treated
346+
as a compression format rather than an archive format. Looking at how different
347+
languages classify GZip, the prevailing trend is to classify it as a
348+
compression format and not an archiving format.
349+
350+
========== ======================== ==============================================================================
351+
Language Compression or Archive Documentation Link
352+
========== ======================== ==============================================================================
353+
Golang Compression https://pkg.go.dev/compress/gzip
354+
Ruby Compression https://docs.ruby-lang.org/en/master/Zlib/GzipFile.html
355+
Rust Compression https://github.com/rust-lang/flate2-rs
356+
Haskell Compression https://hackage.haskell.org/package/zlib
357+
C# Compression https://learn.microsoft.com/en-us/dotnet/api/system.io.compression.gzipstream
358+
Java Archive https://docs.oracle.com/javase/8/docs/api/java/util/zip/package-summary.html
359+
NodeJS Compression https://nodejs.org/api/zlib.html
360+
Web APIs Compression https://developer.mozilla.org/en-US/docs/Web/API/Compression_Streams_API
361+
PHP Compression https://www.php.net/manual/en/function.gzcompress.php
362+
Perl Compression https://perldoc.perl.org/IO::Compress::Gzip
363+
========== ======================== ==============================================================================
364+
365+
In addition, the :mod:`!gzip` module in Python mostly focuses on single block
366+
content and has an API similar to other compression modules, making it a good
367+
fit for the ``compression`` namespace.
303368

304369

305370
Copyright

0 commit comments

Comments
 (0)