Skip to content

Commit d30e3d4

Browse files
Merge branch 'main' into fix/zarrv2_str_fillval
2 parents bc8c977 + 2bbd1f9 commit d30e3d4

File tree

11 files changed

+412
-32
lines changed

11 files changed

+412
-32
lines changed

conftest.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,17 @@ def netcdf4_file_with_data_in_multiple_groups(tmp_path: Path) -> str:
214214
return str(filepath)
215215

216216

217+
@pytest.fixture
218+
def netcdf4_file_with_data_in_sibling_groups(tmp_path: Path) -> str:
219+
"""Create a NetCDF4 file with data in sibling groups."""
220+
filepath = tmp_path / "test.nc"
221+
ds1 = xr.DataArray([1, 2, 3], name="foo").to_dataset()
222+
ds1.to_netcdf(filepath, group="subgroup1")
223+
ds2 = xr.DataArray([4, 5], coords={"x": [0, 1]}, dims="x", name="bar").to_dataset()
224+
ds2.to_netcdf(filepath, group="subgroup2", mode="a")
225+
return str(filepath)
226+
227+
217228
@pytest.fixture
218229
def netcdf4_files_factory(tmp_path: Path) -> Callable[[], tuple[str, str]]:
219230
"""Factory fixture to create multiple NetCDF4 files."""

docs/api/virtualizarr.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,9 @@ Users can use xarray for every step apart from reading and serializing virtual r
99

1010
::: virtualizarr.open_virtual_mfdataset
1111

12+
::: virtualizarr.open_virtual_datatree
13+
14+
1215
## Information
1316

1417
::: virtualizarr.accessor.VirtualiZarrDatasetAccessor.nbytes

docs/releases.md

Lines changed: 35 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,37 @@
11
# Release notes
22

3-
## Unreleased
3+
## unreleased
4+
5+
### Bug fixes
6+
7+
- Fix setting `fill_value` for Zarr V2 arrays if data type is a subtype of integer or float.
8+
([#845](https://github.com/zarr-developers/VirtualiZarr/pull/845)).
9+
By [Hauke Schulz](https://github.com/observingClouds).
10+
11+
## v2.3.0 (20th January 2026)
412

513
### New Features
614

15+
- Implement `open_virtual_datatree`.
16+
([838](https://github.com/zarr-developers/VirtualiZarr/pull/838)).
17+
By [Max Jones](https://github.com/maxrjones).
718
- Set `supports_consolidated_metadata` property on `ManifestStore` to `False`.
819
([809](https://github.com/zarr-developers/VirtualiZarr/pull/809)).
920
By [Julia Signell](https://github.com/jsignell).
1021

11-
### Bug fixes
22+
### Internal changes
1223

13-
- Fix setting `fill_value` for Zarr V2 arrays if data type is a subtype of integer or float.
14-
([#845](https://github.com/zarr-developers/VirtualiZarr/pull/845)).
15-
By [Hauke Schulz](https://github.com/observingClouds).
24+
- Remove the undocumented/unfunctional wrapper of Kerchunk's TIFF parser.
25+
([849](https://github.com/zarr-developers/VirtualiZarr/pull/849)).
26+
By [Max Jones](https://github.com/maxrjones).
1627

1728
## v2.2.1 (17th November 2025)
1829

1930
### Bug fixes
2031

2132
- Allow storing scalar arrays under 'c' key. ([#836](https://github.com/zarr-developers/VirtualiZarr/pull/836)).
2233
By [Max Jones](https://github.com/maxrjones)
23-
- Improve ManifestStore.list_dir for arrays and nested groups ([#837](https://github.com/zarr-developers/VirtualiZarr/pull/837))
34+
- Improve ManifestStore.list_dir for arrays and nested groups. ([#837](https://github.com/zarr-developers/VirtualiZarr/pull/837))
2435
By [Max Jones](https://github.com/maxrjones)
2536

2637
## v2.2.0 (12th November 2025)
@@ -67,6 +78,7 @@ Patch release with minor bug fixes for the DMRPParser and Icechunk writing behav
6778
- Support dtypes without an endianness ([#787](https://github.com/zarr-developers/VirtualiZarr/pull/787)). By [Justus Magin](https://github.com/keewis).
6879

6980
### Internal changes
81+
7082
- Change default Icechunk writing behavior to not validate or write "empty" chunks ([#791](https://github.com/zarr-developers/VirtualiZarr/pull/791)). By [Sean Harkins](https://github.com/sharkinsspatial).
7183

7284
## v2.1.1 (14th August 2025)
@@ -151,10 +163,10 @@ Minor release to ensure compatibility with incoming changes to Icechunk.
151163
- Added [`open_virtual_mfdataset`][virtualizarr.open_virtual_mfdataset] function ([#345](https://github.com/zarr-developers/VirtualiZarr/issues/345), [#349](https://github.com/zarr-developers/VirtualiZarr/pull/349)).
152164
By [Tom Nicholas](https://github.com/TomNicholas).
153165
- Added `datatree_to_icechunk` function for writing an `xarray.DataTree` to
154-
an Icechunk store ([#244](https://github.com/zarr-developers/VirtualiZarr/issues/244)). By [Chuck Daniels](https://github.com/chuckwondo).
166+
an Icechunk store ([#244](https://github.com/zarr-developers/VirtualiZarr/issues/244)). By [Chuck Daniels](https://github.com/chuckwondo).
155167
- Added a `.vz` custom accessor to `xarray.DataTree`, exposing the method
156168
`xarray.DataTree.vz.to_icechunk()` for writing an `xarray.DataTree`
157-
to an Icechunk store ([#244](https://github.com/zarr-developers/VirtualiZarr/issues/244)). By
169+
to an Icechunk store ([#244](https://github.com/zarr-developers/VirtualiZarr/issues/244)). By
158170
[Chuck Daniels](https://github.com/chuckwondo).
159171
- Added a warning if you attempt to write an entirely non-virtual dataset to a virtual references format ([#657](https://github.com/zarr-developers/VirtualiZarr/pull/657)).
160172
By [Tom Nicholas](https://github.com/TomNicholas).
@@ -213,16 +225,16 @@ Minor release to ensure compatibility with incoming changes to Icechunk.
213225
### Internal Changes
214226

215227
- `ManifestArrays` now internally use [zarr.core.metadata.v3.ArrayV3Metadata](https://github.com/zarr-developers/zarr-python/blob/v3.0.2/src/zarr/core/metadata/v3.py). This replaces the `ZArray` class that was previously used to store metadata about manifest arrays. ([#429](https://github.com/zarr-developers/VirtualiZarr/pull/429)) By [Aimee Barciauskas](https://github.com/abarciauskas-bgse). Notable internal changes:
216-
- Make zarr-python a required dependency with a minimum version `>=3.0.2`.
217-
- Specify a minimum numcodecs version of `>=0.15.1`.
218-
- When creating a `ManifestArray`, the `metadata` property should be an `zarr.core.metadata.v3.ArrayV3Metadata` object. There is a helper function `create_v3_array_metadata` which should be used, as it has some useful defaults and includes `convert_to_codec_pipeline` (see next bullet).
219-
- The function `convert_to_codec_pipeline` ensures the codec pipeline passed to `ArrayV3Metadata` has valid codecs in the expected order (`ArrayArrayCodec`s, `ArrayBytesCodec`, `BytesBytesCodec`s) and includes the required `ArrayBytesCodec` using the default for the data type.
220-
- Note: `convert_to_codec_pipeline` uses the zarr-python function `get_codec_class` to convert codec configurations (i.e. `dict`s with a name and configuration key, see [parse_named_configuration](https://github.com/zarr-developers/zarr-python/blob/v3.0.2/src/zarr/core/common.py#L116-L130)) to valid Zarr V3 codec classes.
221-
- Parser changes are minimal.
222-
- Writer changes:
223-
- Kerchunk uses Zarr version format 2 so we convert `ArrayV3Metadata` to `ArrayV2Metadata` using the `convert_v3_to_v2_metadata` function. This means the `to_kerchunk_json` function is now a bit more complex because we're converting `ArrayV2Metadata` filters and compressor to serializable objects.
224-
- zarr-python 3.0 does not yet support the big endian data type. This means that FITS and NetCDF-3 are not currently supported ([zarr-python issue #2324](https://github.com/zarr-developers/zarr-python/issues/2324)).
225-
- zarr-python 3.0 does not yet support datetime and timedelta data types ([zarr-python issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616)).
228+
- Make zarr-python a required dependency with a minimum version `>=3.0.2`.
229+
- Specify a minimum numcodecs version of `>=0.15.1`.
230+
- When creating a `ManifestArray`, the `metadata` property should be an `zarr.core.metadata.v3.ArrayV3Metadata` object. There is a helper function `create_v3_array_metadata` which should be used, as it has some useful defaults and includes `convert_to_codec_pipeline` (see next bullet).
231+
- The function `convert_to_codec_pipeline` ensures the codec pipeline passed to `ArrayV3Metadata` has valid codecs in the expected order (`ArrayArrayCodec`s, `ArrayBytesCodec`, `BytesBytesCodec`s) and includes the required `ArrayBytesCodec` using the default for the data type.
232+
- Note: `convert_to_codec_pipeline` uses the zarr-python function `get_codec_class` to convert codec configurations (i.e. `dict`s with a name and configuration key, see [parse_named_configuration](https://github.com/zarr-developers/zarr-python/blob/v3.0.2/src/zarr/core/common.py#L116-L130)) to valid Zarr V3 codec classes.
233+
- Parser changes are minimal.
234+
- Writer changes:
235+
- Kerchunk uses Zarr version format 2 so we convert `ArrayV3Metadata` to `ArrayV2Metadata` using the `convert_v3_to_v2_metadata` function. This means the `to_kerchunk_json` function is now a bit more complex because we're converting `ArrayV2Metadata` filters and compressor to serializable objects.
236+
- zarr-python 3.0 does not yet support the big endian data type. This means that FITS and NetCDF-3 are not currently supported ([zarr-python issue #2324](https://github.com/zarr-developers/zarr-python/issues/2324)).
237+
- zarr-python 3.0 does not yet support datetime and timedelta data types ([zarr-python issue #2616](https://github.com/zarr-developers/zarr-python/issues/2616)).
226238
- The continuous integration workflows and developer environment now use [pixi](https://pixi.sh/latest/) ([#407](https://github.com/zarr-developers/VirtualiZarr/pull/407)).
227239
- Added `loadable_variables` kwarg to `ManifestStore.to_virtual_dataset`.
228240
([#543](https://github.com/zarr-developers/VirtualiZarr/pull/543)) By [Tom Nicholas](https://github.com/TomNicholas).
@@ -284,14 +296,14 @@ It also fixes a number of bugs, adds minor features, changes the default reader
284296
- Added a `.nbytes` accessor method which displays the bytes needed to hold the virtual references in memory.
285297
([#167](https://github.com/zarr-developers/VirtualiZarr/issues/167), [#227](https://github.com/zarr-developers/VirtualiZarr/pull/227)) By [Tom Nicholas](https://github.com/TomNicholas).
286298
- Upgrade icechunk dependency to `>=0.1.0a12`. ([#406](https://github.com/zarr-developers/VirtualiZarr/pull/406)) By [Julia Signell](https://github.com/jsignell).
287-
- Sync with Icechunk v0.1.0a8 ([#368](https://github.com/zarr-developers/VirtualiZarr/pull/368)) By [Matthew Iannucci](https://github.com/mpiannucci). This also adds support
299+
- Sync with Icechunk v0.1.0a8 ([#368](https://github.com/zarr-developers/VirtualiZarr/pull/368)) By [Matthew Iannucci](https://github.com/mpiannucci). This also adds support
288300
for the `to_icechunk` method to add timestamps as checksums when writing virtual references to an icechunk store. This
289301
is useful for ensuring that virtual references are not stale when reading from an icechunk store, which can happen if the
290302
underlying data has changed since the virtual references were written.
291303
- Add `group=None` keyword-only parameter to the
292304
`VirtualiZarrDatasetAccessor.to_icechunk` method to allow writing to a nested group
293305
at a specified group path (rather than defaulting to the root group, when no group is
294-
specified). ([#341](https://github.com/zarr-developers/VirtualiZarr/issues/341)) By [Chuck Daniels](https://github.com/chuckwondo).
306+
specified). ([#341](https://github.com/zarr-developers/VirtualiZarr/issues/341)) By [Chuck Daniels](https://github.com/chuckwondo).
295307

296308
### Breaking changes
297309

@@ -304,8 +316,8 @@ It also fixes a number of bugs, adds minor features, changes the default reader
304316
([#18](https://github.com/zarr-developers/VirtualiZarr/issues/18), [#357](https://github.com/zarr-developers/VirtualiZarr/pull/357), [#358](https://github.com/zarr-developers/VirtualiZarr/pull/358)) By [Tom Nicholas](https://github.com/TomNicholas).
305317
- The `append_dim` and `last_updated_at` parameters of the
306318
`VirtualiZarrDatasetAccessor.to_icechunk` method are now keyword-only parameters,
307-
rather than positional or keyword. This change is breaking _only_ where arguments for
308-
these parameters are currently given positionally. ([#341](https://github.com/zarr-developers/VirtualiZarr/issues/341)) By
319+
rather than positional or keyword. This change is breaking _only_ where arguments for
320+
these parameters are currently given positionally. ([#341](https://github.com/zarr-developers/VirtualiZarr/issues/341)) By
309321
[Chuck Daniels](https://github.com/chuckwondo).
310322
- The default backend for netCDF4 and HDF5 is now the custom `HDFVirtualBackend` replacing
311323
the previous default which was a wrapper around the kerchunk backend.
@@ -322,7 +334,7 @@ It also fixes a number of bugs, adds minor features, changes the default reader
322334
([#336](https://github.com/zarr-developers/VirtualiZarr/issues/336), [#338](https://github.com/zarr-developers/VirtualiZarr/pull/338)) By [Tom Nicholas](https://github.com/TomNicholas).
323335
- Fix bug in HDF reader where dimension names of dimensions in a subgroup would be incorrect.
324336
([#364](https://github.com/zarr-developers/VirtualiZarr/issues/364), [#366](https://github.com/zarr-developers/VirtualiZarr/pull/366)) By [Tom Nicholas](https://github.com/TomNicholas).
325-
- Fix bug in dmrpp reader so _FillValue is included in variables' encodings.
337+
- Fix bug in dmrpp reader so \_FillValue is included in variables' encodings.
326338
([#369](https://github.com/zarr-developers/VirtualiZarr/pull/369)) By [Aimee Barciauskas](https://github.com/abarciauskas-bgse).
327339
- Fix bug passing arguments to FITS reader, and test it on Hubble Space Telescope data.
328340
([#363](https://github.com/zarr-developers/VirtualiZarr/pull/363)) By [Tom Nicholas](https://github.com/TomNicholas).

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ plugins:
130130
- https://icechunk.io/en/stable/objects.inv
131131
- https://lithops-cloud.github.io/docs/objects.inv
132132
- https://docs.dask.org/en/stable/objects.inv
133+
- https://virtual-tiff.readthedocs.io/en/latest/objects.inv
133134
# https://github.com/developmentseed/titiler/blob/50934c929cca2fa8d3c408d239015f8da429c6a8/docs/mkdocs.yml#L115-L140
134135
markdown_extensions:
135136
- admonition

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ upstream = [
101101
's3fs @ git+https://github.com/fsspec/s3fs',
102102
'kerchunk @ git+https://github.com/fsspec/kerchunk',
103103
'icechunk @ git+https://github.com/earth-mover/icechunk#subdirectory=icechunk-python',
104+
'virtual_tiff @ git+https://github.com/virtual-zarr/virtual-tiff',
104105
]
105106
docs = [
106107
"mkdocs-material[imaging]>=9.6.14",

virtualizarr/__init__.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,11 @@
44
VirtualiZarrDatasetAccessor,
55
VirtualiZarrDataTreeAccessor,
66
)
7-
from virtualizarr.xarray import open_virtual_dataset, open_virtual_mfdataset
7+
from virtualizarr.xarray import (
8+
open_virtual_dataset,
9+
open_virtual_datatree,
10+
open_virtual_mfdataset,
11+
)
812

913
try:
1014
__version__ = _version("virtualizarr")
@@ -18,4 +22,5 @@
1822
"VirtualiZarrDataTreeAccessor",
1923
"open_virtual_dataset",
2024
"open_virtual_mfdataset",
25+
"open_virtual_datatree",
2126
]

virtualizarr/manifests/group.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,3 +125,36 @@ def to_virtual_dataset(self) -> xr.Dataset:
125125
coord_names=coord_names,
126126
attrs=attributes,
127127
)
128+
129+
def to_virtual_datasets(self) -> Mapping[str, xr.Dataset]:
130+
"""
131+
Create a dictionary containing virtual datasets for all the sub-groups of a ManifestGroup. All the
132+
variables in the datasets will be "virtual", i.e., they will wrap ManifestArray objects.
133+
134+
It is convenient to have a separate `to_virtual_datasets` function from `to_virtual_datatree` so that
135+
it can be called recursively without needing to use `DataTree.to_dict() and `.from_dict()` repeatedly.
136+
"""
137+
result = {"": self.to_virtual_dataset()}
138+
139+
# Recursively process all subgroups
140+
for group_name, subgroup in self.groups.items():
141+
subgroup_datasets = subgroup.to_virtual_datasets()
142+
143+
# Add the subgroup's datasets with proper path prefixes
144+
for subpath, dataset in subgroup_datasets.items():
145+
if subpath == "":
146+
# Direct child group
147+
full_path = group_name
148+
else:
149+
# Nested subgroup
150+
full_path = f"{group_name}/{subpath}"
151+
result[full_path] = dataset
152+
return result
153+
154+
def to_virtual_datatree(self) -> xr.DataTree:
155+
"""
156+
Create a "virtual" [xarray.DataTree][] containing the contents of one zarr group.
157+
158+
All variables in the returned DataTree will be "virtual", i.e. they will wrap ManifestArray objects.
159+
"""
160+
return xr.DataTree.from_dict(self.to_virtual_datasets())

virtualizarr/manifests/store.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,42 @@ def to_virtual_dataset(
332332
decode_times=decode_times,
333333
)
334334

335+
def to_virtual_datatree(
336+
self,
337+
group="",
338+
*,
339+
loadable_variables: Iterable[str] | None = None,
340+
decode_times: bool | None = None,
341+
) -> "xr.DataTree":
342+
"""
343+
Create a "virtual" [xarray.DataTree][] containing the contents of a zarr group. Default is the root group and all sub-groups.
344+
345+
Will ignore the contents of any other groups in the store.
346+
347+
Requires xarray.
348+
349+
Parameters
350+
----------
351+
group : Group to convert to a virtual DataTree
352+
loadable_variables
353+
Variables in the data source to load as Dask/NumPy arrays instead of as virtual arrays.
354+
decode_times
355+
Bool that is passed into [xarray.open_dataset][]. Allows time to be decoded into a datetime object.
356+
357+
Returns
358+
-------
359+
vdt : xarray.DataTree
360+
"""
361+
362+
from virtualizarr.xarray import construct_virtual_datatree
363+
364+
return construct_virtual_datatree(
365+
manifest_store=self,
366+
group=group,
367+
loadable_variables=loadable_variables,
368+
decode_times=decode_times,
369+
)
370+
335371

336372
def _transform_byte_range(
337373
byte_range: ByteRequest | None, *, chunk_start: int, chunk_end_exclusive: int

virtualizarr/tests/test_parsers/test_tiff.py

Lines changed: 21 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
import pytest
22
from obstore.store import S3Store
3-
from xarray import Dataset
3+
from xarray import Dataset, DataTree
44

5-
from virtualizarr import open_virtual_dataset
5+
from virtualizarr import open_virtual_dataset, open_virtual_datatree
66
from virtualizarr.registry import ObjectStoreRegistry
77
from virtualizarr.tests import requires_network, requires_tiff
88

@@ -11,7 +11,25 @@
1111

1212
@requires_tiff
1313
@requires_network
14-
def test_virtual_tiff() -> None:
14+
def test_virtual_tiff_datatree() -> None:
15+
store = S3Store("sentinel-cogs", region="us-west-2", skip_signature=True)
16+
registry = ObjectStoreRegistry({"s3://sentinel-cogs/": store})
17+
url = "s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/B04.tif"
18+
parser = virtual_tiff.VirtualTIFF(ifd_layout="nested")
19+
with open_virtual_datatree(url=url, parser=parser, registry=registry) as vdt:
20+
assert isinstance(vdt, DataTree)
21+
assert list(vdt["0"].ds.variables) == ["0"]
22+
var = vdt["0"].ds["0"].variable
23+
assert var.sizes == {"y": 10980, "x": 10980}
24+
assert var.dtype == "<u2"
25+
var = vdt["1"].ds["1"].variable
26+
assert var.sizes == {"y": 5490, "x": 5490}
27+
assert var.dtype == "<u2"
28+
29+
30+
@requires_tiff
31+
@requires_network
32+
def test_virtual_tiff_dataset() -> None:
1533
store = S3Store("sentinel-cogs", region="us-west-2", skip_signature=True)
1634
registry = ObjectStoreRegistry({"s3://sentinel-cogs/": store})
1735
url = "s3://sentinel-cogs/sentinel-s2-l2a-cogs/12/S/UF/2022/6/S2B_12SUF_20220609_0_L2A/B04.tif"

0 commit comments

Comments
 (0)