- Minimum required version of
obspec_utilsis now0.9.0.
- Fix setting
fill_valuefor Zarr V2 arrays if data type is a subtype of integer or float. (#845). By Hauke Schulz. - Fix reading kerchunk parquet references with sparse arrays (missing chunks represented as NULL). (#864). By Tom Nicholas.
- Raise clearer error when kerchunk references have malformed codec specifications. (#864).
- Fix warnings caused by outdated imports from
obspec_utils(#863). By Tom Nicholas. - Allow
ZarrParserto work from inside a running event loop (e.g. inside a Jupyter Notebook) (#900) By Julius Busecke. - Fix Lithops executor to allow use of
functools.partial, and updateget_executorfunction to ensureProcessPoolExecutoruses"forkserver"mode on platforms that default to"fork"(#899). By Chuck Daniels.
This release moves the ObjectStoreRegistry to a separate package obspec_utils, and provides a way to customize how files are read, which can easily allow open_virtual_dataset to run over ~5x faster.
-
Improved
ZarrParserperformance. (#892). By Raphael Hagen. -
Added
reader_factoryparameter toHDFParserto allow customizing how files are read (#844). By Max Jones.
-
Move ObjectStoreRegistry and Reader functionality to obspec_utils (#844). By Max Jones.
ObjectStoreRegistryhas moved fromvirtualizarr.registrytoobspec_utils.registry. The old import path still works but emits aDeprecationWarningand will be removed in a future release.ObstoreReaderhas been removed fromvirtualizarr.utils. This should not break user's code, as it was not part of the public/documented API. See obspec_utils for public file handlers.- Added
obspec_utils>=0.7.0as a required dependency. This package provides theObjectStoreRegistrythat was previously part of VirtualiZarr. - Minimum required version of
obstoreis now0.7.0(previously0.5.1). This was the first release to implement obspec protocols.
- Implement
open_virtual_datatree. (838). By Max Jones. - Set
supports_consolidated_metadataproperty onManifestStoretoFalse. (809). By Julia Signell.
- Allow storing scalar arrays under 'c' key. (#836). By Max Jones
- Improve ManifestStore.list_dir for arrays and nested groups. (#837) By Max Jones
- Allow nested-groups inside
ManifestStoreandManifestGroupobjects and updateHDFParserto be able to create nestedzarr.Groupobjects. (#790). By Ilan Gold ZarrParsernow handles Zarr V2 and V3 array parsing. (#565). By Neil Schroeder- Add Virtual TIFF as an optional dependency for TIFF parsing. (#810) By Max Jones
ZarrParserno longer usesZARR_DEFAULT_FILL_VALUElookup to infer missingfill_value. (#812). By Raphael Hagen.- Return None for Zarr V2/consolidated metadata requests. (#827). By Max Jones
Patch release with minor bug fixes for the DMRPParser and Icechunk writing behavior.
- Enable
DMRPParserto process scalar, dimensionless variables that lack chunks are present. (#666). By Miguel Jimenez-Urias. - Enable
DMRPParserto parse flattened dmrpp metadata reference files, which contain container attributes. (#581). By Miguel Jimenez-Urias. - Support dtypes without an endianness (#787). By Justus Magin.
- Change default Icechunk writing behavior to not validate or write "empty" chunks (#791). By Sean Harkins.
Extremely minor release to ensure compatibility with the soon-to-be released version of xarray (likely named v2025.07.2).
- Adjust for minor upcoming change in private xarray API
xarray.structure.combine._nested_combine. (#779). By Tom Nicholas.
This release fixes a number of important bugs that could silently lead to referenced data being read back incorrectly. In particular, note that writing virtual chunks to Icechunk now requires that all virtual chunk containers are set correctly by default. It also unpins our dependency on xarray, so that VirtualiZarr is compatible with the latest released version of Xarray. Please upgrade!
- Expose
validate_containerskwarg in.to_icechunk, allowing it to be set toFalse(#567, #774). By Tom Nicholas.
- Writing to Icechunk now requires that virtual chunk containers are set correctly for all virtual references by default. (#774). This change is needed because otherwise it can lead to situations in which attempting to read data back returns fill values instead of real data, silently! (See #763) By Tom Nicholas.
- Update minimum required version of Icechunk to
v1.1.2#774. By Tom Nicholas. - Unpin dependency on xarray, by adjusting our tests to pass despite minor changes to the bytes of netCDF files written between versions of xarray #774). By Max Jones and Tom Nicholas.
- Fixed bug where VirtualiZarr was incorrectly failing to raise if virtual chunk containers with correct prefixes were not set for every virtual reference (#774). By Tom Nicholas.
- Fix handling of big-endian data in Icechunk by making sure that non-default zarr serializers are included in the zarr array metadata #766. By Max Jones
- Fix handling of big-endian data in Kerchunk references #769. By Max Jones
- Updated Icechunk examples now that virtual chunk containers are required by default (#774). By Tom Nicholas.
extract_codecsfunction insideconvert_to_codec_pipelinenow raises if it encounters a codec which does not inherit from the correctzarr.abc.codecbase classes. (#775). By Tom Nicholas.
Minor release to ensure compatibility with incoming changes to Icechunk.
- Fixed bug caused by writing empty virtual chunks to Icechunk (#745). By Tom Nicholas.
- Rewrote the internals of
ManifestArray.__getitem__to ensure it actually obeys the array API standard under myriad edge cases (#734). By Tom Nicholas.
- Added recommendation to use
icechunk.Repository.save_config()to persisticechunk.VirtualChunkContainers (#746). By Tom Nicholas.
- Added a pluggable system of "parsers" for generating virtual references from different filetypes. These follow the [
virtualizarr.parsers.typing.Parser][] typing protocol, and return [ManifestStore][virtualizarr.manifests.ManifestStore] objects wrapping obstore stores. (#498, #601) - Added a [Zarr parser][virtualizarr.parsers.ZarrParser] that allows opening Zarr V3 stores as virtual datasets. (#271) By Raphael Hagen.
- Added [
ManifestStore][virtualizarr.manifests.ManifestStore] for loading data from ManifestArrays by (#490) By Max Jones. - Added [
ManifestStore.to_virtual_dataset()][virtualizarr.manifests.ManifestStore.to_virtual_dataset] method (#522). By Tom Nicholas. - Added [
open_virtual_mfdataset][virtualizarr.open_virtual_mfdataset] function (#345, #349). By Tom Nicholas. - Added
datatree_to_icechunkfunction for writing anxarray.DataTreeto an Icechunk store (#244). By Chuck Daniels. - Added a
.vzcustom accessor toxarray.DataTree, exposing the methodxarray.DataTree.vz.to_icechunk()for writing anxarray.DataTreeto an Icechunk store (#244). By Chuck Daniels. - Added a warning if you attempt to write an entirely non-virtual dataset to a virtual references format (#657). By Tom Nicholas.
- Support big-endian data via zarr-python 3.0.9 and zarr v3's new data types system (#618, #677). By Max Jones and Tom Nicholas.
- Added a V1 -> V2 usage migration guide #637. By Raphael Hagen.
- As [
virtualizarr.open_virtual_dataset][] now uses parsers, it's API has changed. #601) See the migration-guide for more details. - The recommended virtualizarr Xarray accessor name is
vzrather thanvirtualize. - Which variables are loadable by default has changed. The behaviour is now to make loadable by default the
same variables which
xarray.open_datasetwould create indexes for: i.e. one-dimensional coordinate variables whose name matches the name of their only dimension (also known as "dimension coordinates"). Pandas indexes will also now be created by default for these loadable variables. This is intended to provide a more friendly default, as often you will want these small variables to be loaded (or "inlined", for efficiency of storage in icechunk/kerchunk), and you will also want to have in-memory indexes for these variables (to allowxarray.combine_by_coordsto sort using them). The old behaviour is equivalent to passingloadable_variables=[]andindexes={}. (#335, #477) by Tom Nicholas. - Moved
ChunkManifest,ManifestArrayetc. to be behind a dedicated.manifestsnamespace. (#620, #624) By Tom Nicholas. - Now by default when writing virtual chunks to Icechunk, the
last_updated_timefor the chunk will be set to the current time. This helps protect users against reading from stale or overwritten chunks stored in Icechunk, by default. (#436, #480) by Tom Nicholas. - Minimum supported version of Icechunk is now
v1.0 - Minimum supported version of Zarr is now
v3.1.0 - Xarray is pinned to
v2025.6.0. We expect to loosen the upper bound shortly.
- Fixed bug causing ManifestArrays to compare as not equal when they were actually identical (#501, #502) By Tom Nicholas.
- Fixed bug causing coordinates to be demoted to data variables when writing to Icechunk (#574, #588) By Tom Nicholas.
- Removed checks forbidding paths in virtual references without file suffixes (#659) By Tom Nicholas.
- Fixed bug when indexing a scalar ManifestArray with an ellipsis(#596, #641) By Max Jones and Tom Nicholas.
- Added more detail to error messages when an indexer of ManifestArray is invalid (#630, #635). By Danny Kaufman.
- Added new docs page on how to write a custom parser for bespoke file formats (#452, #580) By Tom Nicholas.
- Added new docs page on how to scale VirtualiZarr effectively#590. By Tom Nicholas.
- Documented the new [
virtualizarr.open_virtual_mfdataset] function #590. By Tom Nicholas. - Added MUR SST virtual and zarr icechunk store generation using lithops example. (#475) by Aimee Barciauskas.
- Added FAQ answer about what data can be virtualized (#430, #532) By Tom Nicholas.
- Switched docs build to use mkdocs-material instead of sphinx (#615) By Max Jones.
- Moved examples into a
V1/directory and adds notes that examples use the VirtualiZarr V1 syntax #644. By Raphael Hagen.
ManifestArraysnow internally use zarr.core.metadata.v3.ArrayV3Metadata. This replaces theZArrayclass that was previously used to store metadata about manifest arrays. (#429) By Aimee Barciauskas. Notable internal changes:- Make zarr-python a required dependency with a minimum version
>=3.0.2. - Specify a minimum numcodecs version of
>=0.15.1. - When creating a
ManifestArray, themetadataproperty should be anzarr.core.metadata.v3.ArrayV3Metadataobject. There is a helper functioncreate_v3_array_metadatawhich should be used, as it has some useful defaults and includesconvert_to_codec_pipeline(see next bullet). - The function
convert_to_codec_pipelineensures the codec pipeline passed toArrayV3Metadatahas valid codecs in the expected order (ArrayArrayCodecs,ArrayBytesCodec,BytesBytesCodecs) and includes the requiredArrayBytesCodecusing the default for the data type.- Note:
convert_to_codec_pipelineuses the zarr-python functionget_codec_classto convert codec configurations (i.e.dicts with a name and configuration key, see parse_named_configuration) to valid Zarr V3 codec classes.
- Note:
- Parser changes are minimal.
- Writer changes:
- Kerchunk uses Zarr version format 2 so we convert
ArrayV3MetadatatoArrayV2Metadatausing theconvert_v3_to_v2_metadatafunction. This means theto_kerchunk_jsonfunction is now a bit more complex because we're convertingArrayV2Metadatafilters and compressor to serializable objects.
- Kerchunk uses Zarr version format 2 so we convert
- zarr-python 3.0 does not yet support the big endian data type. This means that FITS and NetCDF-3 are not currently supported (zarr-python issue #2324).
- zarr-python 3.0 does not yet support datetime and timedelta data types (zarr-python issue #2616).
- Make zarr-python a required dependency with a minimum version
- The continuous integration workflows and developer environment now use pixi (#407).
- Added
loadable_variableskwarg toManifestStore.to_virtual_dataset. (#543) By Tom Nicholas. - Ensure that the
KerchunkJSONParsercan be used to parse in-memory kerchunk dictionaries usingobstore.store.MemoryStore. (#631) By Tom Nicholas. - Move the
virtualizarr.translators.kerchunkmodule tovirtualizarr.parsers.kerchunk.translator, to better indicate that it is private. Also refactor the two kerchunk readers into one module. (#633) By Tom Nicholas.
Small release which fixes a problem causing the docs to be out of date, fixes some issues in the tests with unclosed file handles, but also increases the performance of writing large numbers of virtual references to Icechunk!
- Minimum supported version of Icechunk is now
v0.2.4(#462) By Tom Nicholas.
- Updates
store.set_virtual_reftostore.set_virtual_refsinwrite_manifest_virtual_refs(#443) By Raphael Hagen.
- Examples use new Icechunk syntax
- Reading and writing Zarr chunk manifest formats are no longer supported. (#359), (#426). By Raphael Hagen.
This release stabilises our dependencies - you can now use released versions of VirtualiZarr, Kerchunk, and Icechunk all in the same environment!
It also fixes a number of bugs, adds minor features, changes the default reader for HDF/netCDF4 files, and includes refactors to reduce code redundancy with zarr-python v3. You can also choose which sets of dependencies you want at installation time.
- Optional dependencies can now be installed in groups via pip. See the installation docs. (#309) By Tom Nicholas.
- Added a
.nbytesaccessor method which displays the bytes needed to hold the virtual references in memory. (#167, #227) By Tom Nicholas. - Upgrade icechunk dependency to
>=0.1.0a12. (#406) By Julia Signell. - Sync with Icechunk v0.1.0a8 (#368) By Matthew Iannucci. This also adds support
for the
to_icechunkmethod to add timestamps as checksums when writing virtual references to an icechunk store. This is useful for ensuring that virtual references are not stale when reading from an icechunk store, which can happen if the underlying data has changed since the virtual references were written. - Add
group=Nonekeyword-only parameter to theVirtualiZarrDatasetAccessor.to_icechunkmethod to allow writing to a nested group at a specified group path (rather than defaulting to the root group, when no group is specified). (#341) By Chuck Daniels.
- Passing
group=None(the default) toopen_virtual_datasetfor a file with multiple groups no longer raises an error, instead it gives you the root group. This new behaviour is more consistent withxarray.open_dataset. (#336, #338) By Tom Nicholas. - Indexes are now created by default for any loadable one-dimensional coordinate variables.
Also a warning is no longer thrown when
indexes=Noneis passed toopen_virtual_dataset, and the recommendations in the docs updated to match. This also means thatxarray.combine_by_coordswill now work when the necessary dimension coordinates are specified inloadable_variables. (#18, #357, #358) By Tom Nicholas. - The
append_dimandlast_updated_atparameters of theVirtualiZarrDatasetAccessor.to_icechunkmethod are now keyword-only parameters, rather than positional or keyword. This change is breaking only where arguments for these parameters are currently given positionally. (#341) By Chuck Daniels. - The default backend for netCDF4 and HDF5 is now the custom
HDFVirtualBackendreplacing the previous default which was a wrapper around the kerchunk backend. (#374, #395) By Julia Signell. - Optional dependency on kerchunk is now the newly-released v0.2.8. This release of kerchunk is compatible with zarr-python v3.0.0, which means a released version of kerchunk can now be used with both VirtualiZarr and Icechunk. (#392, #406, #412) By Julia Signell and Tom Nicholas.
- Fix bug preventing generating references for the root group of a file when a subgroup exists. (#336, #338) By Tom Nicholas.
- Fix bug in HDF reader where dimension names of dimensions in a subgroup would be incorrect. (#364, #366) By Tom Nicholas.
- Fix bug in dmrpp reader so _FillValue is included in variables' encodings. (#369) By Aimee Barciauskas.
- Fix bug passing arguments to FITS reader, and test it on Hubble Space Telescope data. (#363) By Tom Nicholas.
- Change intro text in readme and docs landing page to be clearer, less about the relationship to Kerchunk, and more about why you would want virtual datasets in the first place. (#337) By Tom Nicholas.
- Add netCDF3 test. (#397) By Tom Nicholas.
This release brings a stricter internal model for manifest paths, support for appending to existing icechunk stores, an experimental non-kerchunk-based HDF5 reader, handling of nested groups in DMR++ files, as well as many other bugfixes and documentation improvements.
- Add a
virtual_backend_kwargskeyword argument to file readers and toopen_virtual_dataset, to allow reader-specific options to be passed down. (#315) By Tom Nicholas. - Added append functionality to
to_icechunk(#272) By Aimee Barciauskas.
- Minimum required version of Xarray is now v2024.10.0. (#284) By Tom Nicholas.
- Minimum required version of Icechunk is now v0.1.1. (#419) By Tom Nicholas.
- Minimum required version of Kerchunk is now v0.2.8. (#406) By Julia Signell.
- Opening kerchunk-formatted references from disk which contain relative paths now requires passing the
fs_rootkeyword argument viavirtual_backend_kwargs. (#243) By Tom Nicholas.
- Handle root and nested groups with
dmrppbackend (#265) By Ayush Nag. - Fixed bug with writing of
dimension_namesinto zarr metadata. (#286) By Tom Nicholas. - Fixed bug causing CF-compliant variables not to be identified as coordinates (#191) By Ayush Nag.
- FAQ answers on Icechunk compatibility, converting from existing Kerchunk references to Icechunk, and how to add a new reader for a custom file format. (#266) By Tom Nicholas.
- Clarify which readers actually currently work in FAQ, and temporarily remove tiff from the auto-detection. (#291, #296) By Tom Nicholas.
- Minor improvements to the Contributing Guide. (#298) By Tom Nicholas.
- More minor improvements to the Contributing Guide. (#304) By Doug Latornell.
- Correct some links to the API. (#325) By Tom Nicholas.
- Added links to recorded presentations on VirtualiZarr. (#313) By Tom Nicholas.
- Added links to existing example notebooks. (#329, #331) By Tom Nicholas.
- Added experimental new HDF file reader which doesn't use kerchunk, accessible by importing
virtualizarr.readers.hdf.HDFVirtualBackend. (#87) By Sean Harkins. - Support downstream type checking by adding py.typed marker file. (#306) By Max Jones.
- File paths in chunk manifests are now always stored as absolute URIs. (#243) By Tom Nicholas.
- Can open
kerchunkreference files withopen_virtual_dataset. (#251, #186) By Raphael Hagen & Kristen Thyng. - Adds defaults for
open_virtual_dataset_from_v3_storein (#234) By Raphael Hagen. - New
groupoption onopen_virtual_datasetenables extracting specific HDF Groups. (#165) By Scott Henderson. - Adds
decode_timesto open_virtual_dataset (#232) By Raphael Hagen. - Add parser for the OPeNDAP DMR++ XML format and integration with open_virtual_dataset (#113) By Ayush Nag.
- Load scalar variables by default. (#205) By Gustavo Hidalgo.
- Support empty files (#260) By Justus Magin.
- Can write virtual datasets to Icechunk stores using
virtualize.to_icechunk(#256) By Matt Iannucci.
- Serialize valid ZarrV3 metadata and require full compressor numcodec config (for #193) By Gustavo Hidalgo.
- VirtualiZarr's
ZArray,ChunkEntry, andCodecno longer subclasspydantic.BaseModel(#210) ZArray's__init__signature has changed to matchzarr.Array's (#210)
- Depreciates cftime_variables in open_virtual_dataset in favor of decode_times. (#232) By Raphael Hagen.
- Exclude empty chunks during
ChunkDictconstruction. (#198) By Gustavo Hidalgo. - Fixed regression in
fill_valuehandling for datetime dtypes making virtual Zarr stores unreadable (#206) By Timothy Hodson
- Adds virtualizarr + coiled serverless example notebook (#223) By Raphael Hagen.
- Refactored internal structure significantly to split up everything to do with reading references from that to do with writing references. (#229) (#231) By Tom Nicholas.
- Refactored readers to consider every filetype as a separate reader, all standardized to present the same
open_virtual_datasetinterface internally. (#261) By Tom Nicholas.
This release marks VirtualiZarr as mostly feature-complete, in the sense of achieving feature parity with kerchunk's logic for combining datasets, providing an easier way to manipulate kerchunk references in memory and generate kerchunk reference files on disk.
Future VirtualiZarr development will focus on generalizing and upstreaming useful concepts into the Zarr specification, the Zarr-Python library, Xarray, and possibly some new packages. See the roadmap in the documentation for details.
- Now successfully opens both tiff and FITS files. (#160, #162) By Tom Nicholas.
- Added a
.rename_pathsconvenience method to rename paths in a manifest according to a function. (#152) By Tom Nicholas. - New
cftime_variablesoption onopen_virtual_datasetenables encoding/decoding time. (#122) By Julia Signell.
- Requires numpy 2.0 (for #107). By Tom Nicholas.
- Ensure that
_ARRAY_DIMENSIONSare dropped from variable.attrs. (#150, #152) By Tom Nicholas. - Ensure that
.attrson coordinate variables are preserved during round-tripping. (#155, #154) By Tom Nicholas. - Ensure that non-dimension coordinate variables described via the CF conventions are preserved during round-tripping. (#105, #156) By Tom Nicholas.
- Added example of using cftime_variables to usage docs. (#169, #174) By Tom Nicholas.
- Updated the development roadmap in preparation for v1.0. (#164) By Tom Nicholas.
- Warn if user passes
indexes=Nonetoopen_virtual_datasetto indicate that this is not yet fully supported. (#170) By Tom Nicholas. - Clarify that virtual datasets cannot be treated like normal xarray datasets. (#173) By Tom Nicholas.
- Refactor
ChunkManifestclass to store chunk references internally using numpy arrays. (#107) By Tom Nicholas. - Mark tests which require network access so that they are only run when
--run-network-testsis passed a command-line argument to pytest. (#144) By Tom Nicholas. - Determine file format from magic bytes rather than name suffix (#143) By Scott Henderson.
v0.1 is the first release of VirtualiZarr!! It contains functionality for using kerchunk to find byte ranges in netCDF files, constructing an xarray.Dataset containing ManifestArray objects, then writing out such a dataset to kerchunk references as either json or parquet.