Skip to content

Commit 8e0b1ea

Browse files
authored
Merge pull request #156 from pycompression/release_1.4.0
Release 1.4.0
2 parents 6e3c067 + 693dbd1 commit 8e0b1ea

19 files changed

+1428
-263
lines changed

.github/workflows/ci.yml

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,10 +19,10 @@ jobs:
1919
- uses: actions/[email protected]
2020
with:
2121
submodules: recursive
22-
- name: Set up Python 3.7
22+
- name: Set up Python 3.8
2323
uses: actions/[email protected]
2424
with:
25-
python-version: 3.7
25+
python-version: 3.8
2626
- name: Install tox
2727
run: pip install tox
2828
- name: Lint
@@ -39,10 +39,10 @@ jobs:
3939
- uses: actions/[email protected]
4040
with:
4141
submodules: recursive
42-
- name: Set up Python 3.7
42+
- name: Set up Python 3.8
4343
uses: actions/[email protected]
4444
with:
45-
python-version: 3.7
45+
python-version: 3.8
4646
- name: Install isal
4747
run: sudo apt-get install libisal-dev
4848
- name: Install tox and upgrade setuptools and pip
@@ -57,20 +57,19 @@ jobs:
5757
strategy:
5858
matrix:
5959
python-version:
60-
- "3.7"
6160
- "3.8"
6261
- "3.9"
6362
- "3.10"
6463
- "3.11"
65-
- "pypy-3.7"
66-
- "pypy-3.8"
64+
- "3.12"
6765
- "pypy-3.9"
66+
- "pypy-3.10"
6867
os: ["ubuntu-latest"]
6968
include:
7069
- os: "macos-latest"
71-
python-version: 3.7
70+
python-version: 3.8
7271
- os: "windows-latest"
73-
python-version: 3.7
72+
python-version: 3.8
7473
steps:
7574
- uses: actions/[email protected]
7675
with:
@@ -106,7 +105,7 @@ jobs:
106105
strategy:
107106
matrix:
108107
python_version:
109-
- "3.7"
108+
- "3.8"
110109
steps:
111110
- uses: actions/[email protected]
112111
with:

CHANGELOG.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,21 @@ Changelog
77
.. This document is user facing. Please word the changes in such a way
88
.. that users understand how the changes affect the new version.
99
10+
version 1.4.0
11+
-----------------
12+
+ Drop support for python 3.7 and PyPy 3.8 as these are no longer supported.
13+
Add testing and support for python 3.12 and PyPy 3.10.
14+
+ Added an experimental ``isal.igzip_threaded`` module which has an
15+
``open`` function.
16+
This can be used to read and write large files in a streaming fashion
17+
while escaping the GIL.
18+
+ The internal ``igzip._IGzipReader`` has been rewritten in C. As a result the
19+
overhead of decompressing files has significantly been reduced and
20+
``python -m isal.igzip`` is now very close to the C ``igzip`` application.
21+
+ The ``igzip._IGZipReader`` in C is now used in ``igzip.decompress``. The
22+
``_GzipReader`` also can read from objects that support the buffer protocol.
23+
This has reduced overhead significantly.
24+
1025
version 1.3.0
1126
-----------------
1227
+ Gzip headers are now actively checked for a BGZF extra field. If found the

README.rst

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,17 @@ Acceleration Library (ISA-L) implements several key algorithms in `assembly
4545
language <https://en.wikipedia.org/wiki/Assembly_language>`_. This includes
4646
a variety of functions to provide zlib/gzip-compatible compression.
4747

48-
``python-isal`` provides the bindings by offering three modules:
48+
``python-isal`` provides the bindings by offering four modules:
4949

5050
+ ``isal_zlib``: A drop-in replacement for the zlib module that uses ISA-L to
5151
accelerate its performance.
5252
+ ``igzip``: A drop-in replacement for the gzip module that uses ``isal_zlib``
5353
instead of ``zlib`` to perform its compression and checksum tasks, which
5454
improves performance.
55+
+ ``igzip_threaded`` offers an ``open`` function which returns buffered read
56+
or write streams that can be used to read and write large files while
57+
escaping the GIL using one or multiple threads. This functionality only
58+
works for streaming, seeking is not supported.
5559
+ ``igzip_lib``: Provides compression functions which have full access to the
5660
API of ISA-L's compression functions.
5761

@@ -145,6 +149,10 @@ Differences with zlib and gzip modules
145149
the compression levels are not compatible, a difference in naming was chosen
146150
to reflect this. ``igzip.GzipFile`` does exist as an alias of
147151
``igzip.IGzipFile`` for compatibility reasons.
152+
+ ``igzip._GzipReader`` has been rewritten in C. Since this is a private member
153+
it should not affect compatibility, but it may cause some issues for
154+
instances where this code is used directly. If such issues should occur,
155+
please report them so the compatibility issues can be fixed.
148156

149157
.. differences end
150158
@@ -181,6 +189,15 @@ This project builds upon the software and experience of many. Many thanks to:
181189
<https://github.com/pycompression/xopen>`_ and by extension `cutadapt
182190
<https://github.com/marcelm/cutadapt>`_ projects. This gave python-isal its
183191
first users who used python-isal in production.
192+
+ Mark Adler (@madler) for the excellent comments in his pigz code which made
193+
it very easy to replicate the behaviour for writing gzip with multiple
194+
threads using the ``threading`` and ``isal_zlib`` modules. Another thanks
195+
for his permissive license, which allowed the crc32_combine code to be
196+
included in the project. (ISA-L does not provide a crc32_combine function,
197+
unlike zlib.) And yet another thanks to Mark Adler and also for
198+
Jean-loup Gailly for creating the gzip format which is very heavily used
199+
in bioinformatics. Without that, I would have never written this library
200+
from which I have learned so much.
184201
+ The `github actions team <https://github.com/orgs/actions/people>`_ for
185202
creating the actions CI service that enables building and testing on all
186203
three major operating systems.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
import sys
2+
3+
from isal.isal_zlib import _GzipReader
4+
5+
if __name__ == "__main__":
6+
with open(sys.argv[1], "rb") as f:
7+
reader = _GzipReader(f, 512 * 1024)
8+
while True:
9+
block = reader.read(128 * 1024)
10+
if not block:
11+
break
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
import sys
2+
3+
from isal import igzip_threaded
4+
5+
with igzip_threaded.open(sys.argv[1], "rb") as gzip_file:
6+
while True:
7+
block = gzip_file.read(128 * 1024)
8+
if not block:
9+
break

docs/index.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,13 @@ API-documentation: igzip
124124
:members:
125125
:special-members: __init__
126126

127+
=================================
128+
API-documentation: igzip_threaded
129+
=================================
130+
131+
.. automodule:: isal.igzip_threaded
132+
:members: open
133+
127134
============================
128135
API Documentation: igzip_lib
129136
============================

setup.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
[metadata]
2-
license_file=LICENSE
2+
license_files=LICENSE

setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -135,7 +135,7 @@ def build_isa_l():
135135

136136
setup(
137137
name="isal",
138-
version="1.3.0",
138+
version="1.4.0",
139139
description="Faster zlib and gzip compatible compression and "
140140
"decompression by providing python bindings for the ISA-L "
141141
"library.",
@@ -158,11 +158,11 @@ def build_isa_l():
158158
classifiers=[
159159
"Programming Language :: Python :: 3 :: Only",
160160
"Programming Language :: Python :: 3",
161-
"Programming Language :: Python :: 3.7",
162161
"Programming Language :: Python :: 3.8",
163162
"Programming Language :: Python :: 3.9",
164163
"Programming Language :: Python :: 3.10",
165164
"Programming Language :: Python :: 3.11",
165+
"Programming Language :: Python :: 3.12",
166166
"Programming Language :: Python :: Implementation :: CPython",
167167
"Programming Language :: Python :: Implementation :: PyPy",
168168
"Programming Language :: C",
@@ -173,6 +173,6 @@ def build_isa_l():
173173
"Operating System :: MacOS",
174174
"Operating System :: Microsoft :: Windows",
175175
],
176-
python_requires=">=3.7", # We use METH_FASTCALL
176+
python_requires=">=3.8", # BadGzipFile imported
177177
ext_modules=EXTENSIONS
178178
)

src/isal/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,4 @@
2727
"__version__"
2828
]
2929

30-
__version__ = "1.3.0"
30+
__version__ = "1.4.0"

src/isal/crc32_combine.h

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
/* pigz.c -- parallel implementation of gzip
2+
* Copyright (C) 2007-2023 Mark Adler
3+
* Version 2.8 19 Aug 2023 Mark Adler
4+
*/
5+
6+
/*
7+
This software is provided 'as-is', without any express or implied
8+
warranty. In no event will the author be held liable for any damages
9+
arising from the use of this software.
10+
11+
Permission is granted to anyone to use this software for any purpose,
12+
including commercial applications, and to alter it and redistribute it
13+
freely, subject to the following restrictions:
14+
15+
1. The origin of this software must not be misrepresented; you must not
16+
claim that you wrote the original software. If you use this software
17+
in a product, an acknowledgment in the product documentation would be
18+
appreciated but is not required.
19+
2. Altered source versions must be plainly marked as such, and must not be
20+
misrepresented as being the original software.
21+
3. This notice may not be removed or altered from any source distribution.
22+
23+
Mark Adler
24+
25+
26+
*/
27+
28+
/*
29+
Alterations from original:
30+
- typedef for crc_t
31+
- local declarations replaced with static inline
32+
- g.block selector in crc32_comb removed
33+
*/
34+
35+
#include <stdint.h>
36+
#include <stddef.h>
37+
38+
typedef uint32_t crc_t;
39+
40+
// CRC-32 polynomial, reflected.
41+
#define POLY 0xedb88320
42+
43+
// Return a(x) multiplied by b(x) modulo p(x), where p(x) is the CRC
44+
// polynomial, reflected. For speed, this requires that a not be zero.
45+
static inline crc_t multmodp(crc_t a, crc_t b) {
46+
crc_t m = (crc_t)1 << 31;
47+
crc_t p = 0;
48+
for (;;) {
49+
if (a & m) {
50+
p ^= b;
51+
if ((a & (m - 1)) == 0)
52+
break;
53+
}
54+
m >>= 1;
55+
b = b & 1 ? (b >> 1) ^ POLY : b >> 1;
56+
}
57+
return p;
58+
}
59+
60+
// Table of x^2^n modulo p(x).
61+
static const crc_t x2n_table[] = {
62+
0x40000000, 0x20000000, 0x08000000, 0x00800000, 0x00008000,
63+
0xedb88320, 0xb1e6b092, 0xa06a2517, 0xed627dae, 0x88d14467,
64+
0xd7bbfe6a, 0xec447f11, 0x8e7ea170, 0x6427800e, 0x4d47bae0,
65+
0x09fe548f, 0x83852d0f, 0x30362f1a, 0x7b5a9cc3, 0x31fec169,
66+
0x9fec022a, 0x6c8dedc4, 0x15d6874d, 0x5fde7a4e, 0xbad90e37,
67+
0x2e4e5eef, 0x4eaba214, 0xa8a472c0, 0x429a969e, 0x148d302a,
68+
0xc40ba6d0, 0xc4e22c3c};
69+
70+
// Return x^(n*2^k) modulo p(x).
71+
static inline crc_t x2nmodp(size_t n, unsigned k) {
72+
crc_t p = (crc_t)1 << 31; // x^0 == 1
73+
while (n) {
74+
if (n & 1)
75+
p = multmodp(x2n_table[k & 31], p);
76+
n >>= 1;
77+
k++;
78+
}
79+
return p;
80+
}
81+
82+
// This uses the pre-computed g.shift value most of the time. Only the last
83+
// combination requires a new x2nmodp() calculation.
84+
static inline unsigned long crc32_comb(unsigned long crc1, unsigned long crc2,
85+
size_t len2) {
86+
return multmodp(x2nmodp(len2, 3), crc1) ^ crc2;
87+
}

0 commit comments

Comments
 (0)