.. currentmodule:: hdf5plugin
hdf5plugin allows using additional HDF5 compression filters with h5py for reading and writing compressed datasets.
In order to read compressed dataset with h5py, use:
import hdf5pluginIt registers hdf5plugin supported compression filters with the HDF5 library used by h5py.
Hence, HDF5 compressed datasets can be read as any other dataset (see h5py documentation).
Note
HDF5 datasets compressed with Blosc2 can require additional plugins to enable decompression, such as blosc2-grok or blosc2-openhtj2k. See list of Blosc2 filters and codecs.
As for reading compressed datasets, import hdf5plugin is required to enable the supported compression filters.
To create a compressed dataset use h5py.Group.create_dataset and set the compression and compression_opts arguments.
hdf5plugin provides helpers to prepare those compression options: Bitshuffle, Blosc, Blosc2, BZip2, FciDecomp, LZ4, Sperr, SZ, SZ3, Zfp, Zstd.
Sample code:
import numpy
import h5py
import hdf5plugin
# Compression
f = h5py.File('test.h5', 'w')
f.create_dataset('data', data=numpy.arange(100), compression=hdf5plugin.LZ4())
f.close()
# Decompression
f = h5py.File('test.h5', 'r')
data = f['data'][()]
f.close()Relevant h5py documentation: Filter pipeline and Chunked Storage.
.. autoclass:: Bitshuffle :members: :undoc-members:
.. autoclass:: Blosc :members: :undoc-members:
.. autoclass:: Blosc2 :members: :undoc-members:
.. autoclass:: BZip2 :members: :undoc-members:
.. autoclass:: FciDecomp :members: :undoc-members:
.. autoclass:: LZ4 :members: :undoc-members:
.. autoclass:: Sperr :members: :undoc-members:
.. autoclass:: SZ :members: :undoc-members:
.. autoclass:: SZ3 :members: :undoc-members:
.. autoclass:: Zfp :members: :undoc-members:
.. autoclass:: Zstd :members: :undoc-members:
For compression filters provided by HDF5 and h5py (i.e., GZIP, LZF, SZIP), dataset compression configuration can be retrieved with h5py.Dataset's compression and compression_opts properties.
For third-party compression filters such as the one supported by hdf5plugin,
the dataset compression configuration is stored in HDF5
filter pipeline.
This filter pipeline configuration can be retrieved with h5py.Dataset "low level" API.
For a given h5py.Dataset, dataset:
create_plist = dataset.id.get_create_plist()
for index in range(create_plist.get_nfilters()):
filter_id, _, filter_options, _ = create_plist.get_filter(index)
print(filter_id, filter_options)For compression filters supported by hdf5plugin, :func:`hdf5plugin.from_filter_options` instantiates the filter configuration from the filter id and options.
.. autofunction:: from_filter_options
Constants:
.. py:data:: PLUGIN_PATH Directory where the provided HDF5 filter plugins are stored.
Functions:
.. autofunction:: get_filters
.. autofunction:: get_config
When imported, hdf5plugin initialises and registers the filters it embeds if there is no already registered filters for the corresponding filter IDs.
h5py gives access to HDF5 functions handling registered filters in h5py.h5z. This module allows checking the filter availability and registering/unregistering filters.
hdf5plugin provides an extra register function to register the filters it provides, e.g., to override an already loaded filters. Registering with this function is required to perform additional initialisation and enable writing compressed data with the given filter.
.. autofunction:: register
Non h5py or non-Python users can also benefit from the supplied HDF5 compression filters for reading compressed datasets by setting the HDF5_PLUGIN_PATH environment variable the value of hdf5plugin.PLUGIN_PATH, which can be retrieved from the command line with:
python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)"
For instance:
export HDF5_PLUGIN_PATH=$(python -c "import hdf5plugin; print(hdf5plugin.PLUGIN_PATH)")
should allow MatLab or IDL users to read data compressed using the supported plugins.
Setting the HDF5_PLUGIN_PATH environment variable allows already existing programs or Python code to read compressed data without any modification.