Releases: deshaw/versioned-hdf5
v2.3.0
Major Changes
-
Linux wheels won't be released to pypi for this version, due to incompatibility with
the h5py CI stack. In future releases, Linux wheels will introduce a strict upper pin
to the latest released version of h5py at the moment of publishing.Past versions compatibility:
versioned-hdf5 h5py version compatibility 2.2.1 Linux wheels h5py >=3.15.0,<=3.15.1 2.1.0 Linux wheels h5py >=3.8.0,<3.15.0 MacOSX wheels, conda-forge binaries, and sources are compatible with all h5py releases.
Minor Changes
- Fix NumPy version parsing to account for local versions, e.g.
2.3.0+myvariant0 - Fix bug in
delete_versionswhen the dataset is equal to fillvalue across all versions - Fix bug in
create_datasetwhere dtype metadata for object strings would be discarded
Developers-only Changes
- The development workflow has been migrated to Pixi. This allows for fully reproducible
builds between local environments and CI and makes dev deployment much easier. - Added support for editable installs (
pip install . --editable --no-build-isolation) - The full unit tests suite now runs successfully on Windows and MacOS CI
v2.2.1
Fix wheels publishing issue on Linux.
v2.2.0
Major Changes
- Fixed binary incompatibility of versioned-hdf5 Linux wheels vs. the wheels for h5py >=3.15.0. Starting from this release, versioned-hdf5 wheels for Linux on PyPi require h5py >=3.15.0 wheels. MacOSX wheels, conda-forge packages, and builds from source still require h5py >=3.8.0.
- Added wheels for Python 3.14
- Dropped support for Python 3.9
Minor Changes
Filters have been overhauled:
- Added
shuffle,fletcher32, andscaleoffsetparameters tocreate_datasetandmodify_metadata - Fixed bug where
modify_metadatawould revertcompressionandcompression_optsto their default value when they are not explicitly listed. For example,modify_metadata(ds, fillvalue=123)would decompress a dataset. You now have to explicitly passmodify_metadata(ds, compression=None). .compressionand.compression_optsproperties now return the numerical IDs for custom compression filters (e.g. Blosc, Blosc2) in staged datasets. Note that this is unlikeh5py.Dataset.compression, which incorrectly returns None.- Fixed bug where the
.compressionand.compression_optsproperties of staged datasets would incorrectly returnNone - Fixed bug where passing a path to
create_datasetwould silently disregard thecompressionandcompression_optsparameters
v2.1.0
Major Changes
-
Binaries are now available:
- conda-forge packages for Linux, MacOSX, and Windows;
- pip wheels for Linux and MacOSX (but not Windows).
-
Added support for StringDType, a.k.a. NpyStrings.
Requires h5py >=3.14.0 and NumPy >=2.0.
Like in h5py, StringDType is completely opt-in. -
The
astype()method is now functional; it returns a lazy read-only accessor just like in h5py. -
Fixed breakage in
delete_versions()after upgrading to h5py 3.14.0.
Minor Changes
-
Fixed crash when handling empty multidimensional datasets.
-
Fixed bugs where
InMemoryArrayDataset.__getitem__()and__array__()
would return the wrong dtype. -
The
.chunksproperty of a Dataset would occasionally return an unnecessary
ndindex.ChunkType; it now always returns a tuple of ints like in h5py. -
Fixed bug where the
ENABLE_CHUNK_REUSE_VALIDATIONenvironment variable would incur
in a false positive when the first element of an object array is bytes, but then later
on there are str elements. -
Overhauled ASV benchmarks support.
-
resize():- Fixed crash in
InMemorySparseDataset.resize(). - Fixed issue where calling
InMemoryArrayDataset.resize()to enlarge a dataset was
slow and caused huge memory usage.
- Fixed crash in
-
create_dataset():- The
dtype=parameter now accepts any DTypeLike, e.g."i4"orint, in
addition to actualnumpy.dtypeobjects. - Fixed bug that could lead to overflow/underflow/loss of definition after calling
__setitem__on the initial version of a dataset. - Avoid unnecessary double dtype conversions when passing a list to the
data=
parameter with an explicitdtype=parameter. - Fixed crashes when
chunks=parameter was either a bare integer or True. chunks=Falseis now explicitly disallowed
(before it would lead to an uncontrolled crash).- Warn for all ignored kwargs, not just for
maxshape.
- The
-
modify_metadata():- Fixed bug where setting
fillvalue=0would retain the previous fillvalue. - Speed up when the fillvalue doesn't change.
- Fixed bug when changing dtype and fillvalue at the same time, which could cause the
new fillvalue to be transitorily cast to the old dtype and overflow, underflow,
or lose definition.
- Fixed bug where setting
v2.0.2
- Fixed regression which would cause a crash when invoking
resize()with a tuple of
numpy.int64as argument instead of a tuple of ints, e.g. such as one constructed
fromh5py.Dataset.size.
v2.0.1
- Fixed regression, introduced in v2.0.0, which would cause the chunk hash map to become
corrupted when callingresize()to shrink a dataset followed bydelete_versions().
v2.0.0
-
stage_datasethas been reimplemented from scratch. The new engine is
expected to be much faster in most cases. -
__getitem__on staged datasets used to never cache data when reading from
unmodified datasets (before the first call to__setitem__orresize()) and
used to cache the whole loaded area on modified datasets (where the user had
previously changed a single point anywhere within the same staged version).This has now been changed to always use the libhdf5 cache. As such cache is very
small by default, users on slow disk backends may observe a slowdown in
read-update-write use cases that don't overwrite whole chunks, e.g.ds[::2] += 1.
They should experiment with sizing the libhdf5 cache so that it's larger than the
work area, e.g.:with h5py.File(path, "r+", rdcc_nbytes=2**30, rdcc_nslots=100_000) as f: vf = VersionedHDF5File(f) with vf.stage_version("r123") as sv: sv["some_ds"][::2] += 1
(this recommendation applies to plain h5py datasets too).
Note that this change exclusively impacts
stage_dataset;current_version,
get_version_by_name, andget_version_by_timestampare not impacted and
continue not to cache anything regardless of libhdf5 cache size. -
Added support for Ellipsis (...) in indexing.
v1.8.2
Fixed a build regression introduced in 1.8.1
v1.8.1
What's Changed
- Integer array and boolean array indices are transparently converted to slices when
possible, either globally or locally to each chunk.
This can result in major speedups. by @crusaderky in #388 - Monotonic ascending integer array indices have been sped up from O(n^2) to O(n*logn)
(where n is the number of chunks along the indexed axis). by @crusaderky in #388 as_subchunk_maphas been reimplemented in Cython, providing a speedup. by @ArvidJB and @crusaderky in #364 and #372- Improved the exceptions raised by
create_dataset. by @peytondmurray in #368 - Fixed a libhdf5 resource leak in
build_data_dict; the function has also been sped up. by @crusaderky in #376 and #383 - Slightly sped up hashing algorithm. by @crusaderky in #397
Full Changelog: v1.8.0...v1.8.1
v1.8.0
What's Changed
- Force the master branch to be targeted when building docs to publish by @peytondmurray in #341
- Add version dunder back in using importlib.metadata by @peytondmurray in #344
- Test with numpy 1.24 version now that version 2.* is released by @ArvidJB in #347
- Fix chunk reuse verification for string dtype arrays by @peytondmurray in #348
- Improve read/writing performance for InMemoryDataset (PyInf#12655) by @ArvidJB in #345
- Move slicetools implementation to cython (PyInf#12655) by @ArvidJB in #346
- Fix H5S_sel_type enum declaration by @ArvidJB in #349
- Add additional non-python build dependencies for publish_docs CI job by @peytondmurray in #351
- Clean up
pytestconfig; make test workflow printhdf5config by @peytondmurray in #353 - Make
InMemoryGroupclose child instances if self is closed by @peytondmurray in #354 - Explicitly add all hdf5 types to slicetools to fix local tests by @peytondmurray in #355
- Fix nondefault compression handling by @peytondmurray in #358
- Make the tests workflow update before installing native deps by @peytondmurray in #360
- Improve Hashtable initialization by @ArvidJB in #359
- Label hypothesis tests as slow by @crusaderky in #362
- Nitpicks in design doc by @crusaderky in #361
- Release prep v1.8.0 by @peytondmurray in #363
New Contributors
- @crusaderky made their first contribution in #362
Full Changelog: 1.7.0...v1.8.0