Skip to content

Latest commit

 

History

History
416 lines (309 loc) · 18.8 KB

File metadata and controls

416 lines (309 loc) · 18.8 KB

Release notes

v1.3.3 (unreleased)

New Features

Breaking changes

  • Which variables are loadable by default has changed. The behaviour is now to make loadable by default the same variables which xarray.open_dataset would create indexes for: i.e. one-dimensional coordinate variables whose name matches the name of their only dimension (also known as "dimension coordinates"). Pandas indexes will also now be created by default for these loadable variables. This is intended to provide a more friendly default, as often you will want these small variables to be loaded (or "inlined", for efficiency of storage in icechunk/kerchunk), and you will also want to have in-memory indexes for these variables (to allow xarray.combine_by_coords to sort using them). The old behaviour is equivalent to passing loadable_variables=[] and indexes={}. (:issue:`335`, :pull:`477`) by Tom Nicholas.

Deprecations

Bug fixes

Documentation

Internal Changes

  • ManifestArrays now internally use zarr.core.metadata.v3.ArrayV3Metadata. This replaces the ZArray class that was previously used to store metadata about manifest arrays. (:pull:`429`) By Aimee Barciauskas. Notable internal changes:
    • Make zarr-python a required dependency with a minimum version >=3.0.2.
    • Specify a minimum numcodecs version of >=0.15.1.
    • When creating a ManifestArray, the metadata property should be an zarr.core.metadata.v3.ArrayV3Metadata object. There is a helper function create_v3_array_metadata which should be used, as it has some useful defaults and includes convert_to_codec_pipeline (see next bullet).
    • The function convert_to_codec_pipeline ensures the codec pipeline passed to ArrayV3Metadata has valid codecs in the expected order (ArrayArrayCodec`s, `ArrayBytesCodec, BytesBytesCodec`s) and includes the required `ArrayBytesCodec using the default for the data type. - Note: convert_to_codec_pipeline uses the zarr-python function get_codec_class to convert codec configurations (i.e. dict`s with a name and configuration key, see `parse_named_configuration) to valid Zarr V3 codec classes.
    • Reader changes are minimal.
    • Writer changes: - Kerchunk uses Zarr version format 2 so we convert ArrayV3Metadata to ArrayV2Metadata using the convert_v3_to_v2_metadata function. This means the to_kerchunk_json function is now a bit more complex because we're converting ArrayV2Metadata filters and compressor to serializable objects.
    • zarr-python 3.0 does not yet support the big endian data type. This means that FITS and NetCDF-3 are not currently supported (zarr-python issue #2324).
    • zarr-python 3.0 does not yet support datetime and timedelta data types (zarr-python issue #2616).
  • The continuous integration workflows and developer environment now use pixi (:pull:`407`).

v1.3.2 (3rd Mar 2025)

Small release which fixes a problem causing the docs to be out of date, fixes some issues in the tests with unclosed file handles, but also increases the performance of writing large numbers of virtual references to Icechunk!

New Features

Breaking changes

Deprecations

Bug fixes

Documentation

Internal Changes

  • Updates store.set_virtual_ref to store.set_virtual_refs in write_manifest_virtual_refs (:pull:`443`) By Raphael Hagen.

v1.3.1 (18th Feb 2025)

New Features

  • Examples use new Icechunk syntax

Breaking changes

Deprecations

Bug fixes

Documentation

Internal Changes

v1.3.0 (3rd Feb 2025)

This release stabilises our dependencies - you can now use released versions of VirtualiZarr, Kerchunk, and Icechunk all in the same environment!

It also fixes a number of bugs, adds minor features, changes the default reader for HDF/netCDF4 files, and includes refactors to reduce code redundancy with zarr-python v3. You can also choose which sets of dependencies you want at installation time.

New Features

  • Optional dependencies can now be installed in groups via pip. See the installation docs. (:pull:`309`) By Tom Nicholas.
  • Added a .nbytes accessor method which displays the bytes needed to hold the virtual references in memory. (:issue:`167`, :pull:`227`) By Tom Nicholas.
  • Upgrade icechunk dependency to >=0.1.0a12. (:pull:`406`) By Julia Signell.
  • Sync with Icechunk v0.1.0a8 (:pull:`368`) By Matthew Iannucci <https://github.com/mpiannucci>. This also adds support for the to_icechunk method to add timestamps as checksums when writing virtual references to an icechunk store. This is useful for ensuring that virtual references are not stale when reading from an icechunk store, which can happen if the underlying data has changed since the virtual references were written.
  • Add group=None keyword-only parameter to the VirtualiZarrDatasetAccessor.to_icechunk method to allow writing to a nested group at a specified group path (rather than defaulting to the root group, when no group is specified). (:issue:`341`) By Chuck Daniels.

Breaking changes

  • Passing group=None (the default) to open_virtual_dataset for a file with multiple groups no longer raises an error, instead it gives you the root group. This new behaviour is more consistent with xarray.open_dataset. (:issue:`336`, :pull:`338`) By Tom Nicholas.
  • Indexes are now created by default for any loadable one-dimensional coordinate variables. Also a warning is no longer thrown when indexes=None is passed to open_virtual_dataset, and the recommendations in the docs updated to match. This also means that xarray.combine_by_coords will now work when the necessary dimension coordinates are specified in loadable_variables. (:issue:`18`, :pull:`357`, :pull:`358`) By Tom Nicholas.
  • The append_dim and last_updated_at parameters of the VirtualiZarrDatasetAccessor.to_icechunk method are now keyword-only parameters, rather than positional or keyword. This change is breaking _only_ where arguments for these parameters are currently given positionally. (:issue:`341`) By Chuck Daniels.
  • The default backend for netCDF4 and HDF5 is now the custom HDFVirtualBackend replacing the previous default which was a wrapper around the kerchunk backend. (:issue:`374`, :pull:`395`) By Julia Signell.
  • Optional dependency on kerchunk is now the newly-released v0.2.8. This release of kerchunk is compatible with zarr-python v3.0.0, which means a released version of kerchunk can now be used with both VirtualiZarr and Icechunk. (:issue:`392`, :pull:`406`, :pull:`412``) By Julia Signell and Tom Nicholas.

Deprecations

Bug fixes

Documentation

  • Change intro text in readme and docs landing page to be clearer, less about the relationship to Kerchunk, and more about why you would want virtual datasets in the first place. (:pull:`337`) By Tom Nicholas.

Internal Changes

v1.2.0 (5th Dec 2024)

This release brings a stricter internal model for manifest paths, support for appending to existing icechunk stores, an experimental non-kerchunk-based HDF5 reader, handling of nested groups in DMR++ files, as well as many other bugfixes and documentation improvements.

New Features

Breaking changes

Deprecations

Bug fixes

Documentation

Internal Changes

  • Added experimental new HDF file reader which doesn't use kerchunk, accessible by importing virtualizarr.readers.hdf.HDFVirtualBackend. (:pull:`87`) By Sean Harkins.
  • Support downstream type checking by adding py.typed marker file. (:pull:`306`) By Max Jones.
  • File paths in chunk manifests are now always stored as abolute URIs. (:pull:`243`) By Tom Nicholas.

v1.1.0 (22nd Oct 2024)

New Features

Breaking changes

  • Serialize valid ZarrV3 metadata and require full compressor numcodec config (for :pull:`193`) By Gustavo Hidalgo.
  • VirtualiZarr's ZArray, ChunkEntry, and Codec no longer subclass pydantic.BaseModel (:pull:`210`)
  • ZArray's __init__ signature has changed to match zarr.Array's (:pull:`210`)

Deprecations

Bug fixes

Documentation

Internal Changes

  • Refactored internal structure significantly to split up everything to do with reading references from that to do with writing references. (:issue:`229`) (:pull:`231`) By Tom Nicholas.
  • Refactored readers to consider every filetype as a separate reader, all standardized to present the same open_virtual_dataset interface internally. (:pull:`261`) By Tom Nicholas.

v1.0.0 (9th July 2024)

This release marks VirtualiZarr as mostly feature-complete, in the sense of achieving feature parity with kerchunk's logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.

Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages. See the roadmap in the documentation for details.

New Features

Breaking changes

Deprecations

Bug fixes

Documentation

Internal Changes

  • Refactor ChunkManifest class to store chunk references internally using numpy arrays. (:pull:`107`) By Tom Nicholas.
  • Mark tests which require network access so that they are only run when --run-network-tests is passed a command-line argument to pytest. (:pull:`144`) By Tom Nicholas.
  • Determine file format from magic bytes rather than name suffix (:pull:`143`) By Scott Henderson.

v0.1 (17th June 2024)

v0.1 is the first release of VirtualiZarr!! It contains functionality for using kerchunk to find byte ranges in netCDF files, constructing an xarray.Dataset containing ManifestArray objects, then writing out such a dataset to kerchunk references as either json or parquet.

New Features

Breaking changes

Deprecations

Bug fixes

Documentation

Internal Changes