Releases · deshaw/versioned-hdf5 · GitHub

02 Feb 16:05

crusaderky

v2.3.0 Latest

Latest

Major Changes

Linux wheels won't be released to pypi for this version, due to incompatibility with
the h5py CI stack. In future releases, Linux wheels will introduce a strict upper pin
to the latest released version of h5py at the moment of publishing.

Past versions compatibility:

versioned-hdf5 h5py version compatibility

2.2.1 Linux wheels h5py >=3.15.0,<=3.15.1

2.1.0 Linux wheels h5py >=3.8.0,<3.15.0

MacOSX wheels, conda-forge binaries, and sources are compatible with all h5py releases.

Minor Changes

Fix NumPy version parsing to account for local versions, e.g. 2.3.0+myvariant0
Fix bug in delete_versions when the dataset is equal to fillvalue across all versions
Fix bug in create_dataset where dtype metadata for object strings would be discarded

Developers-only Changes

The development workflow has been migrated to Pixi. This allows for fully reproducible
builds between local environments and CI and makes dev deployment much easier.
Added support for editable installs (pip install . --editable --no-build-isolation)
The full unit tests suite now runs successfully on Windows and MacOS CI

Assets 2

07 Jan 17:15

crusaderky

v2.2.1

Fix wheels publishing issue on Linux.

Assets 2

07 Jan 16:20

crusaderky

v2.2.0

Major Changes

Fixed binary incompatibility of versioned-hdf5 Linux wheels vs. the wheels for h5py >=3.15.0. Starting from this release, versioned-hdf5 wheels for Linux on PyPi require h5py >=3.15.0 wheels. MacOSX wheels, conda-forge packages, and builds from source still require h5py >=3.8.0.
Added wheels for Python 3.14
Dropped support for Python 3.9

Minor Changes

Filters have been overhauled:

Added shuffle, fletcher32, and scaleoffset parameters to create_dataset and modify_metadata
Fixed bug where modify_metadata would revert compression and compression_opts to their default value when they are not explicitly listed. For example, modify_metadata(ds, fillvalue=123) would decompress a dataset. You now have to explicitly pass modify_metadata(ds, compression=None).
.compression and .compression_opts properties now return the numerical IDs for custom compression filters (e.g. Blosc, Blosc2) in staged datasets. Note that this is unlike h5py.Dataset.compression, which incorrectly returns None.
Fixed bug where the .compression and .compression_opts properties of staged datasets would incorrectly return None
Fixed bug where passing a path to create_dataset would silently disregard the compression and compression_opts parameters

Assets 2

12 Aug 10:48

crusaderky

v2.1.0

Major Changes

Binaries are now available:
- conda-forge packages for Linux, MacOSX, and Windows;
- pip wheels for Linux and MacOSX (but not Windows).
Added support for StringDType, a.k.a. NpyStrings.
Requires h5py >=3.14.0 and NumPy >=2.0.
Like in h5py, StringDType is completely opt-in.
The astype() method is now functional; it returns a lazy read-only accessor just like in h5py.
Fixed breakage in delete_versions() after upgrading to h5py 3.14.0.

Minor Changes

Fixed crash when handling empty multidimensional datasets.
Fixed bugs where InMemoryArrayDataset.__getitem__() and __array__()
would return the wrong dtype.
The .chunks property of a Dataset would occasionally return an unnecessary
ndindex.ChunkType; it now always returns a tuple of ints like in h5py.
Fixed bug where the ENABLE_CHUNK_REUSE_VALIDATION environment variable would incur
in a false positive when the first element of an object array is bytes, but then later
on there are str elements.
Overhauled ASV benchmarks support.
resize():
- Fixed crash in InMemorySparseDataset.resize().
- Fixed issue where calling InMemoryArrayDataset.resize() to enlarge a dataset was
  slow and caused huge memory usage.
create_dataset():
- The dtype= parameter now accepts any DTypeLike, e.g. "i4" or int, in
  addition to actual numpy.dtype objects.
- Fixed bug that could lead to overflow/underflow/loss of definition after calling
  __setitem__ on the initial version of a dataset.
- Avoid unnecessary double dtype conversions when passing a list to the data=
  parameter with an explicit dtype= parameter.
- Fixed crashes when chunks= parameter was either a bare integer or True.
- chunks=False is now explicitly disallowed
  (before it would lead to an uncontrolled crash).
- Warn for all ignored kwargs, not just for maxshape.
modify_metadata():
- Fixed bug where setting fillvalue=0 would retain the previous fillvalue.
- Speed up when the fillvalue doesn't change.
- Fixed bug when changing dtype and fillvalue at the same time, which could cause the
  new fillvalue to be transitorily cast to the old dtype and overflow, underflow,
  or lose definition.

Assets 2

23 Jan 14:51

crusaderky

v2.0.2

Fixed regression which would cause a crash when invoking resize() with a tuple of
numpy.int64 as argument instead of a tuple of ints, e.g. such as one constructed
from h5py.Dataset.size.

Assets 2

22 Jan 13:31

crusaderky

v2.0.1

Fixed regression, introduced in v2.0.0, which would cause the chunk hash map to become
corrupted when calling resize() to shrink a dataset followed by delete_versions().

Assets 2

05 Dec 12:36

crusaderky

v2.0.0

stage_dataset has been reimplemented from scratch. The new engine is
expected to be much faster in most cases.
__getitem__ on staged datasets used to never cache data when reading from
unmodified datasets (before the first call to __setitem__ or resize()) and
used to cache the whole loaded area on modified datasets (where the user had
previously changed a single point anywhere within the same staged version).

This has now been changed to always use the libhdf5 cache. As such cache is very
small by default, users on slow disk backends may observe a slowdown in
read-update-write use cases that don't overwrite whole chunks, e.g. ds[::2] += 1.
They should experiment with sizing the libhdf5 cache so that it's larger than the
work area, e.g.:
```
with h5py.File(path, "r+", rdcc_nbytes=2**30, rdcc_nslots=100_000) as f:
    vf = VersionedHDF5File(f)
    with vf.stage_version("r123") as sv:
        sv["some_ds"][::2] += 1
```
(this recommendation applies to plain h5py datasets too).

Note that this change exclusively impacts stage_dataset; current_version,
get_version_by_name, and get_version_by_timestamp are not impacted and
continue not to cache anything regardless of libhdf5 cache size.
Added support for Ellipsis (...) in indexing.

Assets 2

21 Nov 21:55

crusaderky

v1.8.2

Fixed a build regression introduced in 1.8.1

Assets 2

21 Nov 19:20

crusaderky

v1.8.1

What's Changed

Integer array and boolean array indices are transparently converted to slices when
possible, either globally or locally to each chunk.
This can result in major speedups. by @crusaderky in #388
Monotonic ascending integer array indices have been sped up from O(n^2) to O(n*logn)
(where n is the number of chunks along the indexed axis). by @crusaderky in #388
as_subchunk_map has been reimplemented in Cython, providing a speedup. by @ArvidJB and @crusaderky in #364 and #372
Improved the exceptions raised by create_dataset. by @peytondmurray in #368
Fixed a libhdf5 resource leak in build_data_dict; the function has also been sped up. by @crusaderky in #376 and #383
Slightly sped up hashing algorithm. by @crusaderky in #397

Full Changelog: v1.8.0...v1.8.1

Contributors

crusaderky, peytondmurray, and ArvidJB

Assets 2

09 Aug 23:35

v1.8.0

What's Changed

Force the master branch to be targeted when building docs to publish by @peytondmurray in #341
Add version dunder back in using importlib.metadata by @peytondmurray in #344
Test with numpy 1.24 version now that version 2.* is released by @ArvidJB in #347
Fix chunk reuse verification for string dtype arrays by @peytondmurray in #348
Improve read/writing performance for InMemoryDataset (PyInf#12655) by @ArvidJB in #345
Move slicetools implementation to cython (PyInf#12655) by @ArvidJB in #346
Fix H5S_sel_type enum declaration by @ArvidJB in #349
Add additional non-python build dependencies for publish_docs CI job by @peytondmurray in #351
Clean up pytest config; make test workflow print hdf5 config by @peytondmurray in #353
Make InMemoryGroup close child instances if self is closed by @peytondmurray in #354
Explicitly add all hdf5 types to slicetools to fix local tests by @peytondmurray in #355
Fix nondefault compression handling by @peytondmurray in #358
Make the tests workflow update before installing native deps by @peytondmurray in #360
Improve Hashtable initialization by @ArvidJB in #359
Label hypothesis tests as slow by @crusaderky in #362
Nitpicks in design doc by @crusaderky in #361
Release prep v1.8.0 by @peytondmurray in #363

New Contributors

@crusaderky made their first contribution in #362

Full Changelog: 1.7.0...v1.8.0

Contributors

crusaderky, peytondmurray, and ArvidJB

Assets 2