Skip to content

Releases: deshaw/versioned-hdf5

v2.3.0

02 Feb 16:05
d7003e1

Choose a tag to compare

Major Changes

  • Linux wheels won't be released to pypi for this version, due to incompatibility with
    the h5py CI stack. In future releases, Linux wheels will introduce a strict upper pin
    to the latest released version of h5py at the moment of publishing.

    Past versions compatibility:

    versioned-hdf5 h5py version compatibility
    2.2.1 Linux wheels h5py >=3.15.0,<=3.15.1
    2.1.0 Linux wheels h5py >=3.8.0,<3.15.0

    MacOSX wheels, conda-forge binaries, and sources are compatible with all h5py releases.

Minor Changes

  • Fix NumPy version parsing to account for local versions, e.g. 2.3.0+myvariant0
  • Fix bug in delete_versions when the dataset is equal to fillvalue across all versions
  • Fix bug in create_dataset where dtype metadata for object strings would be discarded

Developers-only Changes

  • The development workflow has been migrated to Pixi. This allows for fully reproducible
    builds between local environments and CI and makes dev deployment much easier.
  • Added support for editable installs (pip install . --editable --no-build-isolation)
  • The full unit tests suite now runs successfully on Windows and MacOS CI

v2.2.1

07 Jan 17:15
1c1814c

Choose a tag to compare

Fix wheels publishing issue on Linux.

v2.2.0

07 Jan 16:20
c6b5d65

Choose a tag to compare

Major Changes

  • Fixed binary incompatibility of versioned-hdf5 Linux wheels vs. the wheels for h5py >=3.15.0. Starting from this release, versioned-hdf5 wheels for Linux on PyPi require h5py >=3.15.0 wheels. MacOSX wheels, conda-forge packages, and builds from source still require h5py >=3.8.0.
  • Added wheels for Python 3.14
  • Dropped support for Python 3.9

Minor Changes

Filters have been overhauled:

  • Added shuffle, fletcher32, and scaleoffset parameters to create_dataset and modify_metadata
  • Fixed bug where modify_metadata would revert compression and compression_opts to their default value when they are not explicitly listed. For example, modify_metadata(ds, fillvalue=123) would decompress a dataset. You now have to explicitly pass modify_metadata(ds, compression=None).
  • .compression and .compression_opts properties now return the numerical IDs for custom compression filters (e.g. Blosc, Blosc2) in staged datasets. Note that this is unlike h5py.Dataset.compression, which incorrectly returns None.
  • Fixed bug where the .compression and .compression_opts properties of staged datasets would incorrectly return None
  • Fixed bug where passing a path to create_dataset would silently disregard the compression and compression_opts parameters

v2.1.0

12 Aug 10:48
6943ceb

Choose a tag to compare

Major Changes

  • Binaries are now available:

    • conda-forge packages for Linux, MacOSX, and Windows;
    • pip wheels for Linux and MacOSX (but not Windows).
  • Added support for StringDType, a.k.a. NpyStrings.
    Requires h5py >=3.14.0 and NumPy >=2.0.
    Like in h5py, StringDType is completely opt-in.

  • The astype() method is now functional; it returns a lazy read-only accessor just like in h5py.

  • Fixed breakage in delete_versions() after upgrading to h5py 3.14.0.

Minor Changes

  • Fixed crash when handling empty multidimensional datasets.

  • Fixed bugs where InMemoryArrayDataset.__getitem__() and __array__()
    would return the wrong dtype.

  • The .chunks property of a Dataset would occasionally return an unnecessary
    ndindex.ChunkType; it now always returns a tuple of ints like in h5py.

  • Fixed bug where the ENABLE_CHUNK_REUSE_VALIDATION environment variable would incur
    in a false positive when the first element of an object array is bytes, but then later
    on there are str elements.

  • Overhauled ASV benchmarks support.

  • resize():

    • Fixed crash in InMemorySparseDataset.resize().
    • Fixed issue where calling InMemoryArrayDataset.resize() to enlarge a dataset was
      slow and caused huge memory usage.
  • create_dataset():

    • The dtype= parameter now accepts any DTypeLike, e.g. "i4" or int, in
      addition to actual numpy.dtype objects.
    • Fixed bug that could lead to overflow/underflow/loss of definition after calling
      __setitem__ on the initial version of a dataset.
    • Avoid unnecessary double dtype conversions when passing a list to the data=
      parameter with an explicit dtype= parameter.
    • Fixed crashes when chunks= parameter was either a bare integer or True.
    • chunks=False is now explicitly disallowed
      (before it would lead to an uncontrolled crash).
    • Warn for all ignored kwargs, not just for maxshape.
  • modify_metadata():

    • Fixed bug where setting fillvalue=0 would retain the previous fillvalue.
    • Speed up when the fillvalue doesn't change.
    • Fixed bug when changing dtype and fillvalue at the same time, which could cause the
      new fillvalue to be transitorily cast to the old dtype and overflow, underflow,
      or lose definition.

v2.0.2

23 Jan 14:51
7013d82

Choose a tag to compare

  • Fixed regression which would cause a crash when invoking resize() with a tuple of
    numpy.int64 as argument instead of a tuple of ints, e.g. such as one constructed
    from h5py.Dataset.size.

v2.0.1

22 Jan 13:31
c292b04

Choose a tag to compare

  • Fixed regression, introduced in v2.0.0, which would cause the chunk hash map to become
    corrupted when calling resize() to shrink a dataset followed by delete_versions().

v2.0.0

05 Dec 12:36
7587b7e

Choose a tag to compare

  • stage_dataset has been reimplemented from scratch. The new engine is
    expected to be much faster in most cases.

  • __getitem__ on staged datasets used to never cache data when reading from
    unmodified datasets (before the first call to __setitem__ or resize()) and
    used to cache the whole loaded area on modified datasets (where the user had
    previously changed a single point anywhere within the same staged version).

    This has now been changed to always use the libhdf5 cache. As such cache is very
    small by default, users on slow disk backends may observe a slowdown in
    read-update-write use cases that don't overwrite whole chunks, e.g. ds[::2] += 1.
    They should experiment with sizing the libhdf5 cache so that it's larger than the
    work area, e.g.:

    with h5py.File(path, "r+", rdcc_nbytes=2**30, rdcc_nslots=100_000) as f:
        vf = VersionedHDF5File(f)
        with vf.stage_version("r123") as sv:
            sv["some_ds"][::2] += 1

    (this recommendation applies to plain h5py datasets too).

    Note that this change exclusively impacts stage_dataset; current_version,
    get_version_by_name, and get_version_by_timestamp are not impacted and
    continue not to cache anything regardless of libhdf5 cache size.

  • Added support for Ellipsis (...) in indexing.

v1.8.2

21 Nov 21:55
ea91964

Choose a tag to compare

Fixed a build regression introduced in 1.8.1

v1.8.1

21 Nov 19:20
824d16f

Choose a tag to compare

What's Changed

  • Integer array and boolean array indices are transparently converted to slices when
    possible, either globally or locally to each chunk.
    This can result in major speedups. by @crusaderky in #388
  • Monotonic ascending integer array indices have been sped up from O(n^2) to O(n*logn)
    (where n is the number of chunks along the indexed axis). by @crusaderky in #388
  • as_subchunk_map has been reimplemented in Cython, providing a speedup. by @ArvidJB and @crusaderky in #364 and #372
  • Improved the exceptions raised by create_dataset. by @peytondmurray in #368
  • Fixed a libhdf5 resource leak in build_data_dict; the function has also been sped up. by @crusaderky in #376 and #383
  • Slightly sped up hashing algorithm. by @crusaderky in #397

Full Changelog: v1.8.0...v1.8.1

v1.8.0

09 Aug 23:35
5dff0de

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: 1.7.0...v1.8.0