Skip to content

Commit c177799

Browse files
authored
Merge pull request #18 from pycompression/release_0.1.0
Release 0.1.0
2 parents 69f155b + ecc6996 commit c177799

34 files changed

+4454
-6
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
2+
### Checklist
3+
- [ ] Pull request details were added to CHANGELOG.rst

.github/release_checklist.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Release checklist
2+
- [ ] Check outstanding issues on JIRA and Github.
3+
- [ ] Check [latest documentation](https://python-isal.readthedocs.io/en/latest/) looks fine.
4+
- [ ] Create a release branch.
5+
- [ ] Set version to a stable number.
6+
- [ ] Change current development version in `CHANGELOG.rst` to stable version.
7+
- [ ] Merge the release branch into `main`.
8+
- [ ] Create a test pypi package from the main branch. ([Instructions.](
9+
https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives
10+
))
11+
- [ ] Install the packages from the test pypi repository to see if they work.
12+
- [ ] Created an annotated tag with the stable version number. Include changes
13+
from CHANGELOG.rst.
14+
- [ ] Push tag to remote.
15+
- [ ] Push tested packages to pypi.
16+
- [ ] merge `main` branch back into `develop`.
17+
- [ ] Add updated version number to develop.
18+
- [ ] Build the new tag on readthedocs. Only build the last patch version of
19+
each minor version. So `1.1.1` and `1.2.0` but not `1.1.0`, `1.1.1` and `1.2.0`.
20+
- [ ] Create a new release on github.
21+
- [ ] Update the package on conda-forge.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@ __pycache__/
66
# C extensions
77
*.so
88

9+
# Cython generated files
10+
*.c
11+
*.html
12+
913
# Distribution / packaging
1014
.Python
1115
build/

.readthedocs.yml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
version: 2
2+
formats: [] # Do not build epub and pdf
3+
4+
python:
5+
install:
6+
- method: pip
7+
path: .
8+
conda:
9+
environment: docs/conda-environment.yml

.travis.yml

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
language: python
2+
3+
before_install:
4+
# Install conda
5+
- export MINICONDA=${HOME}/miniconda
6+
- export PATH=${MINICONDA}/bin:${PATH}
7+
- wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
8+
- bash miniconda.sh -b -f -p ${MINICONDA}
9+
- conda config --set always_yes yes
10+
- conda config --add channels defaults
11+
- conda config --add channels conda-forge
12+
13+
install:
14+
- conda create -n python-isal python=$TRAVIS_PYTHON_VERSION tox isa-l
15+
- source activate python-isal
16+
17+
python: 3.6 # Use the oldest supported version of python as default.
18+
script:
19+
- tox -e $TOX_ENV
20+
matrix:
21+
include:
22+
# TEST DOCS AND LINTING
23+
# Use default python3 version here.
24+
- env: TOX_ENV=docs
25+
- env: TOX_ENV=lint
26+
# UNIT TESTS
27+
# On most recent versions of python.
28+
- env: TOX_ENV=py36
29+
after_success:
30+
- pip install codecov
31+
- codecov -v # -v to make sure coverage upload works.
32+
- python: 3.7
33+
env: TOX_ENV=py37
34+
- python: 3.8
35+
env: TOX_ENV=py38

CHANGELOG.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
==========
2+
Changelog
3+
==========
4+
5+
.. Newest changes should be on top.
6+
7+
.. This document is user facing. Please word the changes in such a way
8+
.. that users understand how the changes affect the new version.
9+
10+
version 0.1.0
11+
-----------------
12+
+ Publish API documentation on readthedocs.
13+
+ Add API documentation.
14+
+ Ensure the igzip module is fully compatible with the gzip stdlib module.
15+
+ Add compliance tests from CPython to ensure isal_zlib and igzip are validated
16+
to the same standards as the zlib and gzip modules.
17+
+ Added a working gzip app using ``python -m isal.igzip``
18+
+ Add test suite that tests all possible settings for functions on the
19+
isal_zlib module.
20+
+ Create igzip module which implements all gzip functions and methods.
21+
+ Create isal_zlib module which implements all zlib functions and methods.

README.md

Lines changed: 0 additions & 6 deletions
This file was deleted.

README.rst

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
.. image:: https://readthedocs.org/projects/python-isal/badge
2+
:target: https://python-isal.readthedocs.io
3+
:alt:
4+
5+
6+
python-isal
7+
===========
8+
9+
Faster zlib and gzip compatible compression and decompression
10+
by providing python bindings for the isa-l library.
11+
12+
This package provides Python bindings for the `isa-l
13+
<https://github.com/intel/isa-l>`_ library. The Intel Infrastructure Storage
14+
Acceleration Library (isa-l) implements several key algorithms in `assembly
15+
language <https://en.wikipedia.org/wiki/Assembly_language>`_. This includes
16+
a variety of functions to provide zlib/gzip-compatible compression.
17+
18+
``python-isal`` provides the bindings by offering an ``isal_zlib`` and
19+
``igzip`` module which are usable as drop-in replacements for the ``zlib``
20+
and ``gzip`` modules from the stdlib (with some minor exceptions, see below).
21+
22+
Installation
23+
------------
24+
25+
isa-l version 2.26.0 or higher is needed. This includes bindings for the
26+
adler32 function.
27+
28+
isa-l is available in numerous Linux distro's as well as on conda via the
29+
conda-forge channel. Checkout the `ports documentation
30+
<https://github.com/intel/isa-l/wiki/Ports--Repos>`_ on the isa-l project wiki
31+
to find out how to install it.
32+
33+
The latest development version of python-isal can be installed with
34+
35+
.. code-block::
36+
37+
pip install git+https://github.com/rhpvorderman/python-isal.git
38+
39+
Usage
40+
-----
41+
42+
Python-isal has faster versions of the stdlib's ``zlib`` and ``gzip`` module
43+
these are called ``isal_zlib`` and ``igzip`` respectively.
44+
45+
They can be imported as follows
46+
47+
.. code-block:: python
48+
49+
from isal import isal_zlib
50+
from isal import igzip
51+
52+
``isal_zlib`` and ``igzip`` were meant to be used as drop in replacements so
53+
their api and functions are the same as the stdlib's modules. Except where
54+
isa-l does not support the same calls as zlib (See differences below).
55+
56+
A full API documentation can be found on `our readthedocs page
57+
<https://python-isal.readthedocs.io>`_.
58+
59+
``python -m isal.igzip`` implements a simple gzip-like command line
60+
application (just like ``python -m gzip``).
61+
62+
Differences with zlib and gzip modules
63+
--------------------------------------
64+
65+
+ Compression level 0 in ``zlib`` and ``gzip`` means **no compression**, while
66+
in ``isal_zlib`` and ``igzip`` this is the **lowest compression level**.
67+
This is a design choice that was inherited from the isa-l library.
68+
+ Compression levels range from 0 to 3, not 1 to 9.
69+
+ ``isal_zlib.crc32`` and ``isal_zlib.adler32`` do not support negative
70+
numbers for the value parameter.
71+
+ ``zlib.Z_DEFAULT_STRATEGY``, ``zlib.Z_RLE`` etc. are exposed as
72+
``isal_zlib.Z_DEFAULT_STRATEGY``, ``isal_zlib.Z_RLE`` etc. for compatibility
73+
reasons. However, ``isal_zlib`` only supports a default strategy and will
74+
give warnings when other strategies are used.
75+
+ ``zlib`` supports different memory levels from 1 to 9 (with 8 default).
76+
``isal_zlib`` supports memory levels smallest, small, medium, large and
77+
largest. These have been mapped to levels 1, 2-3, 4-6, 7-8 and 9. So
78+
``isal_zlib`` can be used with zlib compatible memory levels.
79+
+ ``isal_zlib`` only supports ``FLUSH``, ``SYNC_FLUSH`` and ``FULL_FLUSH``
80+
``FINISH`` is aliased to ``FULL_FLUSH`` (and works correctly as such).
81+
+ ``isal_zlib`` has a ``compressobj`` and ``decompressobj`` implementation.
82+
However, the unused_data and unconsumed_tail for the Decompress object, only
83+
work properly when using gzip compatible compression. (25 <= wbits <= 31).
84+
+ The flush implementation for the Compress object behavious differently from
85+
the zlib equivalent.
86+
87+
Contributing
88+
------------
89+
Please make a PR or issue if you feel anything can be improved. Bug reports
90+
are also very welcome. Please report them on the `github issue tracker
91+
<https://github.com/rhpvorderman/python-isal/issues>`_.

benchmark.py

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
import argparse
2+
import gzip
3+
import timeit
4+
import zlib
5+
from pathlib import Path
6+
from typing import Dict
7+
8+
from isal import igzip, isal_zlib # noqa: F401 used in timeit strings
9+
10+
DATA_DIR = Path(__file__).parent / "tests" / "data"
11+
COMPRESSED_FILE = DATA_DIR / "test.fastq.gz"
12+
with gzip.open(str(COMPRESSED_FILE), mode="rb") as file_h:
13+
data = file_h.read()
14+
15+
sizes: Dict[str, bytes] = {
16+
"0b": b"",
17+
"8b": data[:8],
18+
"128b": data[:128],
19+
"1kb": data[:1024],
20+
"8kb": data[:8 * 1024],
21+
"16kb": data[:16 * 1024],
22+
"32kb": data[:32 * 1024],
23+
"64kb": data[:64 * 1024],
24+
# "128kb": data[:128*1024],
25+
# "512kb": data[:512*1024]
26+
}
27+
compressed_sizes = {name: zlib.compress(data_block)
28+
for name, data_block in sizes.items()}
29+
30+
compressed_sizes_gzip = {name: gzip.compress(data_block)
31+
for name, data_block in sizes.items()}
32+
33+
34+
def show_sizes():
35+
print("zlib sizes")
36+
print("name\t" + "\t".join(str(level) for level in range(-1, 10)))
37+
for name, data_block in sizes.items():
38+
orig_size = max(len(data_block), 1)
39+
rel_sizes = (
40+
str(round(len(zlib.compress(data_block, level)) / orig_size, 3))
41+
for level in range(-1, 10))
42+
print(name + "\t" + "\t".join(rel_sizes))
43+
44+
print("isal sizes")
45+
print("name\t" + "\t".join(str(level) for level in range(0, 4)))
46+
for name, data_block in sizes.items():
47+
orig_size = max(len(data_block), 1)
48+
rel_sizes = (
49+
str(round(len(isal_zlib.compress(data_block, level)) / orig_size,
50+
3))
51+
for level in range(0, 4))
52+
print(name + "\t" + "\t".join(rel_sizes))
53+
54+
55+
def benchmark(name: str,
56+
names_and_data: Dict[str, bytes],
57+
isal_string: str,
58+
zlib_string: str,
59+
number: int = 10_000,
60+
**kwargs):
61+
print(name)
62+
print("name\tisal\tzlib\tratio")
63+
for name, data_block in names_and_data.items():
64+
timeit_kwargs = dict(globals=dict(**globals(), **locals()),
65+
number=number, **kwargs)
66+
isal_time = timeit.timeit(isal_string, **timeit_kwargs)
67+
zlib_time = timeit.timeit(zlib_string, **timeit_kwargs)
68+
isal_nanosecs = round(isal_time * (1_000_000 / number), 2)
69+
zlib_nanosecs = round(zlib_time * (1_000_000 / number), 2)
70+
ratio = round(isal_time / zlib_time, 2)
71+
print("{0}\t{1}\t{2}\t{3}".format(name,
72+
isal_nanosecs,
73+
zlib_nanosecs,
74+
ratio))
75+
76+
77+
# show_sizes()
78+
79+
def argument_parser() -> argparse.ArgumentParser:
80+
parser = argparse.ArgumentParser()
81+
parser.add_argument("--all", action="store_true")
82+
parser.add_argument("--checksums", action="store_true")
83+
parser.add_argument("--functions", action="store_true")
84+
parser.add_argument("--gzip", action="store_true")
85+
return parser
86+
87+
88+
if __name__ == "__main__":
89+
args = argument_parser().parse_args()
90+
if args.checksums or args.all:
91+
benchmark("CRC32", sizes,
92+
"isal_zlib.crc32(data_block)",
93+
"zlib.crc32(data_block)")
94+
95+
benchmark("Adler32", sizes,
96+
"isal_zlib.adler32(data_block)",
97+
"zlib.adler32(data_block)")
98+
if args.functions or args.all:
99+
benchmark("Compression", sizes,
100+
"isal_zlib.compress(data_block, 1)",
101+
"zlib.compress(data_block, 1)")
102+
103+
benchmark("Decompression", compressed_sizes,
104+
"isal_zlib.decompress(data_block)",
105+
"zlib.decompress(data_block)")
106+
107+
if args.gzip or args.all:
108+
benchmark("Compression", sizes,
109+
"igzip.compress(data_block, 1)",
110+
"gzip.compress(data_block, 1)")
111+
112+
benchmark("Decompression", compressed_sizes_gzip,
113+
"igzip.decompress(data_block)",
114+
"gzip.decompress(data_block)")

docs/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = .
9+
BUILDDIR = _build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

0 commit comments

Comments
 (0)