Skip to content
Merged
Show file tree
Hide file tree
Changes from 167 commits
Commits
Show all changes
170 commits
Select commit Hold shift + click to select a range
26a94df
wip toward zarr v2 reader
norlandrhagen Oct 24, 2024
cfb7b8d
removed _ARRAY_DIMENSIONS and trimmed down attrs
norlandrhagen Oct 24, 2024
2f26f03
WIP for zarr reader
norlandrhagen Oct 24, 2024
eab87a6
adding in the key piece, the reader
norlandrhagen Oct 24, 2024
13db375
virtual dataset is returned! Now to deal with fill_value
norlandrhagen Oct 31, 2024
cc30ad7
Merge branch 'main' into zarr_reader
norlandrhagen Nov 12, 2024
a047ff9
Update virtualizarr/readers/zarr.py
norlandrhagen Nov 12, 2024
072bead
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Nov 12, 2024
f7c9a3f
replace fsspec ls with zarr.getsize
norlandrhagen Nov 15, 2024
2024606
lint
norlandrhagen Nov 15, 2024
443435b
wip test_zarr
norlandrhagen Nov 15, 2024
50fd8b5
removed pdb
norlandrhagen Nov 15, 2024
d93c932
zarr import in type checking
norlandrhagen Nov 19, 2024
39be1c5
moved get_chunk_paths & get_chunk_size async funcs outside of constru…
norlandrhagen Nov 19, 2024
e718240
added a few notes from PR review.
norlandrhagen Nov 19, 2024
bbcd473
removed array encoding
norlandrhagen Nov 19, 2024
ed9f2b4
v2 passing, v3 skipped for now
norlandrhagen Nov 19, 2024
db89da7
added missed staged files
norlandrhagen Nov 19, 2024
e3d4318
fixed merge conflicts with main
norlandrhagen Nov 19, 2024
410b2a3
missing return
norlandrhagen Nov 19, 2024
8a69963
add network
norlandrhagen Nov 19, 2024
3fca8e6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Nov 19, 2024
34053b0
conftest fix
norlandrhagen Nov 19, 2024
5c26b1f
naming
norlandrhagen Nov 19, 2024
fb784dc
comment out integration test for now
norlandrhagen Nov 19, 2024
0444fd4
refactored test_dataset_from_zarr ZArray tests
norlandrhagen Nov 20, 2024
66fd456
adds zarr v3 req opt
norlandrhagen Nov 20, 2024
13fce09
zarr_v3 decorator
norlandrhagen Nov 20, 2024
c36962d
add more tests
norlandrhagen Nov 20, 2024
4be4906
wip
norlandrhagen Nov 21, 2024
ca5ff32
adds missing await
norlandrhagen Nov 21, 2024
88cbeca
more tests
norlandrhagen Nov 21, 2024
1fbdc9c
wip
norlandrhagen Nov 21, 2024
370621f
wip on v3
norlandrhagen Nov 21, 2024
9bb0653
add note + xfail v3
norlandrhagen Nov 21, 2024
7e03ea5
tmp run network
norlandrhagen Nov 21, 2024
5c1e331
revert
norlandrhagen Nov 21, 2024
9404625
update construct_virtual_array ordering
norlandrhagen Nov 22, 2024
1a5a960
merge
norlandrhagen Dec 3, 2024
cc7d68c
updated ABC after merge
norlandrhagen Dec 3, 2024
ac105ea
wip
norlandrhagen Dec 9, 2024
7b57bd0
Merge branch 'main' into zarr_reader
norlandrhagen Dec 9, 2024
ff01c92
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 9, 2024
4f2470a
working for v2 and v3, but only local
norlandrhagen Dec 10, 2024
0c1ff82
merge
norlandrhagen Dec 10, 2024
05d4050
cleanup test_zarr reader test
norlandrhagen Dec 11, 2024
f40ba28
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Dec 11, 2024
b5fb802
cleanup after zarr-python issue report
norlandrhagen Dec 12, 2024
be5280f
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Dec 12, 2024
690ffee
temp disabled validate_and_normalize_path_to_uri due to issue in zarr…
norlandrhagen Dec 16, 2024
98600e7
Merge branch 'main' into zarr_reader
norlandrhagen Dec 16, 2024
31a1b94
marked zarr integration test skipped b/c of zarr-v3 and kerchunk inco…
norlandrhagen Dec 16, 2024
795c428
fixes some async behavior, reading from s3 seems to work
norlandrhagen Dec 17, 2024
c0004c6
lint + uri_fmt
norlandrhagen Dec 17, 2024
60b8912
adds to releases.rst
norlandrhagen Dec 17, 2024
8240997
nit
norlandrhagen Dec 17, 2024
816e696
cleanup, comments and nits
norlandrhagen Dec 17, 2024
31aacf9
progress on mypy
norlandrhagen Dec 17, 2024
5d14b20
make mypy happy
norlandrhagen Dec 17, 2024
fb844b6
adds option for AsyncArray to _is_zarr_array
norlandrhagen Dec 18, 2024
421f53f
big async rewrite
norlandrhagen Dec 19, 2024
cedad11
merge w/ main
norlandrhagen Dec 19, 2024
1c5e42d
fixes merge conflict
norlandrhagen Dec 19, 2024
89d8555
bit of restructure
norlandrhagen Dec 19, 2024
c1a5218
nit
norlandrhagen Dec 19, 2024
6af84b4
WIP on ChunkManifest.from_arrays
norlandrhagen Dec 20, 2024
349386f
v2/v3 c chunk fix + build ChunkManifest from numpy arrays
norlandrhagen Dec 21, 2024
c776ab9
removed method of creating ChunkManifests from dicts
norlandrhagen Dec 21, 2024
fb6fff7
cleanup
norlandrhagen Dec 21, 2024
87c74d4
adds xfails to TestOpenVirtualDatasetZarr due to local filesystem zar…
norlandrhagen Dec 21, 2024
9e44a8a
Merge branch 'main' into zarr_reader
norlandrhagen Jan 9, 2025
87dbdae
some nits after merging w/ main
norlandrhagen Jan 9, 2025
855fb5a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 9, 2025
29434f1
updates zarr v3 req
norlandrhagen Jan 9, 2025
0dcfc91
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Jan 9, 2025
5f7040c
lint
norlandrhagen Jan 9, 2025
d3b0a92
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 9, 2025
4716114
remove build_chunk_manifest_from_dict_mapping function since manifest…
norlandrhagen Jan 9, 2025
32f7060
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Jan 9, 2025
dc6e6f8
tmp ignore lint
norlandrhagen Jan 9, 2025
9db4339
remove zarr fill_value skip
norlandrhagen Jan 9, 2025
4e0fb99
fixes network req import in test_integration
norlandrhagen Jan 9, 2025
72ae8b0
bump xarray to 2025.1.1 and icechunk to 0.1.0a10 in upstream
norlandrhagen Jan 9, 2025
177f2cf
merge w/ dep bump
norlandrhagen Jan 9, 2025
d61e593
move zarr import into type checking
norlandrhagen Jan 9, 2025
9edf706
move zarr import in test_zarr
norlandrhagen Jan 9, 2025
3e68537
adding back in missing nbytes property
norlandrhagen Jan 9, 2025
594d4a8
typing
norlandrhagen Jan 9, 2025
3c6dc54
tmp testing & removing old xfail
norlandrhagen Jan 10, 2025
dd20c8a
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 10, 2025
9be9455
merge w/ main
norlandrhagen Jan 27, 2025
dac6c77
adds back in validate_and_normalize_path_to_uri after upstream zarr f…
norlandrhagen Jan 27, 2025
3d230dc
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 27, 2025
62ffe02
removing kerchunk from zarr integration test
norlandrhagen Jan 28, 2025
843c286
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Jan 28, 2025
231b703
removed zarr manifest + lint
norlandrhagen Jan 28, 2025
0d4d653
wip on testing
norlandrhagen Jan 29, 2025
7724969
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jan 29, 2025
a8c498a
revert min-deps change
norlandrhagen Jan 29, 2025
9684dd3
merge
norlandrhagen Jan 29, 2025
9fb201d
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Jan 29, 2025
e310f04
revert environment.yaml
norlandrhagen Jan 29, 2025
800a165
removed zarr manifest writing
norlandrhagen Jan 29, 2025
6609ae3
Merge branch 'main' into zarr_reader
norlandrhagen Jan 29, 2025
ac538c7
cleanup and consolidation in zarr reader
norlandrhagen Jan 29, 2025
3be76ff
typing
norlandrhagen Jan 29, 2025
a74047c
Merge branch 'main' into zarr_reader
norlandrhagen Jan 29, 2025
1e60835
test_unsupported_zarr_python to zarr v3
norlandrhagen Jan 29, 2025
5d91679
rel path issue?
norlandrhagen Jan 29, 2025
5084adf
revert accidental icechunk commit
norlandrhagen Jan 29, 2025
136cc2f
merge w/ main
norlandrhagen Feb 3, 2025
e7a36d7
wip on fixing codecs
norlandrhagen Feb 4, 2025
df5a19e
cleaup of tests + codecs
norlandrhagen Feb 4, 2025
5f88589
Merge branch 'main' into zarr_reader
norlandrhagen Feb 4, 2025
efd0064
renived test_zarr writer
norlandrhagen Feb 4, 2025
442e519
bumping icechunk for now
norlandrhagen Feb 4, 2025
54308fb
typing lint
norlandrhagen Feb 4, 2025
973f6b0
remove zarr writer test
norlandrhagen Feb 4, 2025
f1c6c7d
merge w/ main
norlandrhagen Feb 4, 2025
a17fb23
merge w/ develop branch
norlandrhagen Mar 10, 2025
024b020
adds Zarr V2 reader not supported exception
norlandrhagen Mar 15, 2025
db4e617
updates usage and releases and lints upstream.yaml
norlandrhagen Mar 15, 2025
1dc93c2
lint + clarified some todo/comments
norlandrhagen Mar 15, 2025
e45d953
quick nit, removed duplicated entry in ci
norlandrhagen Mar 15, 2025
6bb11fe
removed some comments and reverted pyproject
norlandrhagen Mar 19, 2025
e7b0544
Merge branch 'develop' into zarr_reader
norlandrhagen Mar 19, 2025
255ed37
pyproj de-dup
norlandrhagen Mar 19, 2025
c8d51c9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 19, 2025
42d79ef
util fpaht
norlandrhagen Mar 19, 2025
05354ad
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Mar 19, 2025
001e09b
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 19, 2025
9f8c6f9
adding test to check zarr key format in manifest
norlandrhagen Mar 19, 2025
70b3796
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Mar 19, 2025
5df848f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 19, 2025
03d019c
switched Manifest creation back to dict
norlandrhagen Mar 25, 2025
f0d1a9c
merge
norlandrhagen Mar 25, 2025
db167eb
update zarr reader with merge
norlandrhagen Mar 26, 2025
5f3ccc7
cleaned up zarr reader ArrayV3Metadata reading
norlandrhagen Mar 26, 2025
c55b905
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 26, 2025
9071e15
vendor cleanup
norlandrhagen Mar 26, 2025
1541f14
Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…
norlandrhagen Mar 26, 2025
2ce1dff
merge
norlandrhagen Apr 11, 2025
b34ee09
merge w/ develop and update construct_virtual_dataset
norlandrhagen Apr 11, 2025
b07ec91
added _zstd_codec check in get_codec_config to fix numcodecs complaint
norlandrhagen Apr 11, 2025
78e7f9d
mypy lint
norlandrhagen Apr 11, 2025
fdef913
mypy lint 2
norlandrhagen Apr 11, 2025
6765a1d
lint
norlandrhagen Apr 11, 2025
6df8f73
typing
norlandrhagen Apr 11, 2025
864576e
adds check for filepath
norlandrhagen Apr 12, 2025
7df8ecf
Merge branch 'develop' into zarr_reader
norlandrhagen Apr 12, 2025
4349efd
spelling nit + revert hdf int
norlandrhagen Apr 12, 2025
32c97dd
removed virtualizarr.zarr + cleanup nits
norlandrhagen Apr 12, 2025
e8c6244
cleanup + note
norlandrhagen Apr 18, 2025
242e38b
updates docs/faq.md data table
norlandrhagen Apr 18, 2025
b951dcb
revert leading slash
norlandrhagen Apr 18, 2025
5b6afd6
Merge branch 'develop' into zarr_reader
maxrjones Apr 19, 2025
a262c7d
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 19, 2025
77074bd
Fix bad merge commit
maxrjones Apr 19, 2025
f28db33
Use ManifestStore in Zarr reader (#554)
maxrjones Apr 22, 2025
bdf4d20
filepath slash nit
norlandrhagen Apr 23, 2025
ad01521
Update docs/faq.md
norlandrhagen Apr 23, 2025
4d0151e
Update virtualizarr/readers/zarr.py
norlandrhagen Apr 23, 2025
243cd32
Update virtualizarr/readers/zarr.py
norlandrhagen Apr 23, 2025
dc2a266
Update virtualizarr/readers/zarr.py
norlandrhagen Apr 23, 2025
1591412
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 23, 2025
7805033
adds back in todo
norlandrhagen Apr 23, 2025
7d8f75a
adds wip test for scalar chunk testing
norlandrhagen Apr 23, 2025
ccc9279
adds test for scalar zarr + modifies get_chunk_mapping_prefix to acco…
norlandrhagen Apr 24, 2025
a238177
update localstore to memorystore
norlandrhagen Apr 24, 2025
9f851d1
Merge branch 'develop' into zarr_reader
norlandrhagen Apr 24, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,31 @@ def pytest_runtest_setup(item):
pytest.skip("set --run-minio-tests to run tests requiring docker and minio")


def _xarray_subset():
ds = xr.tutorial.open_dataset("air_temperature")
return ds.isel(time=slice(0, 10), lat=slice(0, 90), lon=slice(0, 180))


@pytest.fixture(params=[2, 3])
def zarr_store(tmpdir, request):
ds = _xarray_subset()
filepath = f"{tmpdir}/air.zarr"
ds.to_zarr(filepath, zarr_format=request.param)
ds.close()
return filepath


@pytest.fixture()
def zarr_store_scalar(tmpdir):
import zarr

# can/should we create a memorystore instead?
store = zarr.storage.LocalStore(str(tmpdir + "/tmp.zarr"))
zarr_store_scalar = zarr.create_array(store=store, shape=(), dtype="int8")
zarr_store_scalar[()] = 42
return zarr_store_scalar


# Common codec configurations
DELTA_CODEC = {"name": "numcodecs.delta", "configuration": {"dtype": "<i8"}}
ARRAYBYTES_CODEC = {"name": "bytes", "configuration": {"endian": "little"}}
Expand Down
4 changes: 2 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ vds.virtualize.to_icechunk(icechunkstore)

### I already have some data in Zarr, do I have to resave it?

No! VirtualiZarr can (well, [soon will be able to](https://github.com/zarr-developers/VirtualiZarr/issues/262)) create virtual references pointing to existing Zarr stores in the same way as for other file formats.
No! VirtualiZarr can create virtual references pointing to existing Zarr stores in the same way as for other file formats. Note: Currently only reading Zarr V3 is supported.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an issue to track learning to read zarr v2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


### Can I add a new reader for my custom file format?

Expand Down Expand Up @@ -119,7 +119,7 @@ Users of Kerchunk may find the following comparison table useful, which shows wh
| From a netCDF3 file | `kerchunk.netCDF3.NetCDF3ToZarr` | `open_virtual_dataset(..., filetype='netcdf3')`, via `kerchunk.netCDF3.NetCDF3ToZarr` |
| From a COG / tiff file | `kerchunk.tiff.tiff_to_zarr` | `open_virtual_dataset(..., filetype='tiff')`, via `kerchunk.tiff.tiff_to_zarr` or potentially `tifffile` (❌ Not yet implemented - see [issue #291](https://github.com/zarr-developers/VirtualiZarr/issues/291)) |
| From a Zarr v2 store | `kerchunk.zarr.ZarrToZarr` | `open_virtual_dataset(..., filetype='zarr')` (❌ Not yet implemented - see [issue #262](https://github.com/zarr-developers/VirtualiZarr/issues/262)) |
| From a Zarr v3 store | | `open_virtual_dataset(..., filetype='zarr')` (❌ Not yet implemented - see [issue #262](https://github.com/zarr-developers/VirtualiZarr/issues/262)) |
| From a Zarr v3 store | | `open_virtual_dataset(..., filetype='zarr')` |
| From a GRIB2 file | `kerchunk.grib2.scan_grib` | `open_virtual_datatree(..., filetype='grib')` (❌ Not yet implemented - see [issue #11](https://github.com/zarr-developers/VirtualiZarr/issues/11)) |
| From a FITS file | `kerchunk.fits.process_file` | `open_virtual_dataset(..., filetype='fits')`, via `kerchunk.fits.process_file` |
| From a HDF4 file | `kerchunk.hdf4.HDF4ToZarr` | `open_virtual_dataset(..., filetype='hdf4')`, via `kerchunk.hdf4.HDF4ToZarr` (❌ Not yet implemented - see [issue #216](https://github.com/zarr-developers/VirtualiZarr/issues/216)) |
Expand Down
2 changes: 2 additions & 0 deletions docs/releases.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ v1.3.3 (unreleased)
New Features
~~~~~~~~~~~~

- Adds a Zarr reader to ``open_virtual_dataset``, which allows opening Zarr V3 stores as virtual datasets.
(:pull:`#271`) By `Raphael Hagen <https://github.com/norlandrhagen>`_.
- Added experimental ManifestStore (:pull:`490`).
- Added :py:meth:`ManifestStore.to_virtual_dataset()` method (:pull:`522`).
By `Tom Nicholas <https://github.com/TomNicholas>`_.
Expand Down
6 changes: 4 additions & 2 deletions virtualizarr/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
KerchunkVirtualBackend,
NetCDF3VirtualBackend,
TIFFVirtualBackend,
ZarrVirtualBackend,
)
from virtualizarr.readers.api import VirtualBackend
from virtualizarr.utils import _FsspecFSFromFilepath
Expand All @@ -43,6 +44,7 @@
# TODO add entrypoint to allow external libraries to add to this mapping
VIRTUAL_BACKENDS = {
"kerchunk": KerchunkVirtualBackend,
"zarr": ZarrVirtualBackend,
"dmrpp": DMRPPVirtualBackend,
"hdf5": HDFVirtualBackend,
"netcdf4": HDFVirtualBackend, # note this is the same as for hdf5
Expand Down Expand Up @@ -70,6 +72,7 @@ class FileType(AutoName):
fits = auto()
dmrpp = auto()
kerchunk = auto()
zarr = auto()


def automatically_determine_filetype(
Expand All @@ -87,8 +90,7 @@ def automatically_determine_filetype(

# TODO how do we handle kerchunk json / parquet here?
if Path(filepath).suffix == ".zarr":
# TODO we could imagine opening an existing zarr store, concatenating it, and writing a new virtual one...
raise NotImplementedError()
return FileType.zarr

# Read magic bytes from local or remote file
fpath = _FsspecFSFromFilepath(
Expand Down
10 changes: 10 additions & 0 deletions virtualizarr/codecs.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,10 +94,20 @@ def get_codec_config(codec: ZarrCodec) -> dict[str, Any]:
"""
Extract configuration from a codec, handling both zarr-python and numcodecs codecs.
"""

if hasattr(codec, "codec_config"):
return codec.codec_config
elif hasattr(codec, "get_config"):
return codec.get_config()
elif hasattr(codec, "_zstd_codec"):
# related issue: https://github.com/zarr-developers/VirtualiZarr/issues/514
# very silly workaround. codec.to_dict for zstd gives:
# {'name': 'zstd', 'configuration': {'level': 0, 'checksum': False}}
# which when passed through ArrayV2Metadata -> numcodecs.get_codec gives the error:
# *** numcodecs.errors.UnknownCodecError: codec not available: 'None'
# if codec._zstd_codec.get_config() : {'id': 'zstd', 'level': 0, 'checksum': False}
# is passed to numcodecs.get_codec. It works fine.
return codec._zstd_codec.get_config()
elif hasattr(codec, "to_dict"):
return codec.to_dict()
else:
Expand Down
21 changes: 3 additions & 18 deletions virtualizarr/manifests/manifest.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
import json
import re
from collections.abc import ItemsView, Iterable, Iterator, KeysView, ValuesView
from pathlib import PosixPath
Expand Down Expand Up @@ -84,7 +83,8 @@ def validate_and_normalize_path_to_uri(path: str, fs_root: str | None = None) ->
return urlunparse(components)

elif any(path.startswith(prefix) for prefix in VALID_URI_PREFIXES):
if not PosixPath(path).suffix:
# Question: This feels fragile, is there a better way to ID a Zarr
if not PosixPath(path).suffix and "zarr" not in path:
raise ValueError(
f"entries in the manifest must be paths to files, but this path has no file suffix: {path}"
)
Expand All @@ -96,7 +96,7 @@ def validate_and_normalize_path_to_uri(path: str, fs_root: str | None = None) ->
# using PosixPath here ensures a clear error would be thrown on windows (whose paths and platform are not officially supported)
_path = PosixPath(path)

if not _path.suffix:
if not _path.suffix and "zarr" not in path:
raise ValueError(
f"entries in the manifest must be paths to files, but this path has no file suffix: {path}"
)
Expand Down Expand Up @@ -436,21 +436,6 @@ def __eq__(self, other: Any) -> bool:
lengths_equal = (self._lengths == other._lengths).all()
return paths_equal and offsets_equal and lengths_equal

@classmethod
def from_zarr_json(cls, filepath: str) -> "ChunkManifest":
"""Create a ChunkManifest from a Zarr manifest.json file."""

with open(filepath, "r") as manifest_file:
entries = json.load(manifest_file)

return cls(entries=entries)

def to_zarr_json(self, filepath: str) -> None:
"""Write the manifest to a Zarr manifest.json file."""
entries = self.dict()
with open(filepath, "w") as json_file:
json.dump(entries, json_file, indent=4, separators=(", ", ": "))

def rename_paths(
self,
new: str | Callable[[str], str],
Expand Down
2 changes: 1 addition & 1 deletion virtualizarr/manifests/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

from virtualizarr.manifests.array import ManifestArray
from virtualizarr.manifests.group import ManifestGroup
from virtualizarr.vendor.zarr.metadata import dict_to_buffer
from virtualizarr.vendor.zarr.core.metadata import dict_to_buffer

if TYPE_CHECKING:
from collections.abc import AsyncGenerator, Iterable, Mapping
Expand Down
4 changes: 4 additions & 0 deletions virtualizarr/readers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
from virtualizarr.readers.kerchunk import KerchunkVirtualBackend
from virtualizarr.readers.netcdf3 import NetCDF3VirtualBackend
from virtualizarr.readers.tiff import TIFFVirtualBackend
from virtualizarr.readers.zarr import (
ZarrVirtualBackend,
)

__all__ = [
"DMRPPVirtualBackend",
Expand All @@ -14,4 +17,5 @@
"KerchunkVirtualBackend",
"NetCDF3VirtualBackend",
"TIFFVirtualBackend",
"ZarrVirtualBackend",
]
187 changes: 187 additions & 0 deletions virtualizarr/readers/zarr.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
from __future__ import annotations

import asyncio
from pathlib import Path # noqa
from typing import (
Any,
Hashable,
Iterable,
Mapping,
Optional,
)

import numpy as np
from xarray import Dataset, Index
from zarr.api.asynchronous import open_group as open_group_async
from zarr.core.metadata import ArrayV3Metadata

from virtualizarr.manifests import (
ChunkManifest,
ManifestArray,
ManifestGroup,
ManifestStore,
)
from virtualizarr.manifests.manifest import validate_and_normalize_path_to_uri # noqa
from virtualizarr.readers.api import VirtualBackend
from virtualizarr.vendor.zarr.core.common import _concurrent_map

FillValueT = bool | str | float | int | list | None

ZARR_DEFAULT_FILL_VALUE: dict[str, FillValueT] = {
# numpy dtypes's hierarchy lets us avoid checking for all the widths
# https://numpy.org/doc/stable/reference/arrays.scalars.html
np.dtype("bool").kind: False,
np.dtype("int").kind: 0,
np.dtype("float").kind: 0.0,
np.dtype("complex").kind: [0.0, 0.0],
np.dtype("datetime64").kind: 0,
}


import zarr


async def get_chunk_mapping_prefix(zarr_array: zarr.AsyncArray, filepath: str) -> dict:
"""Create a dictionary to pass into ChunkManifest __init__"""

# TODO: For when we want to support reading V2 we should parse the /c/ and "/" between chunks

prefix = zarr_array.name.lstrip("/") + "/c/"
prefix_keys = [(x,) async for x in zarr_array.store.list_prefix(prefix)]
_lengths = await _concurrent_map(prefix_keys, zarr_array.store.getsize)

chunk_keys = [x[0].split(prefix)[1] for x in prefix_keys]
_dict_keys = [key.replace("/", ".") for key in chunk_keys]
_paths = [filepath + "/" + prefix + key for key in chunk_keys]

_offsets = [0] * len(_lengths)
return {
key: {"path": path, "offset": offset, "length": length}
for key, path, offset, length in zip(
_dict_keys,
_paths,
_offsets,
_lengths,
)
}


async def build_chunk_manifest(
zarr_array: zarr.AsyncArray, filepath: str
) -> ChunkManifest:
"""Build a ChunkManifest from a dictionary"""
chunk_map = await get_chunk_mapping_prefix(zarr_array=zarr_array, filepath=filepath)
return ChunkManifest(chunk_map)


def get_metadata(zarr_array: zarr.AsyncArray[Any]) -> ArrayV3Metadata:
fill_value = zarr_array.metadata.fill_value
if fill_value is not None:
fill_value = ZARR_DEFAULT_FILL_VALUE[zarr_array.metadata.fill_value.dtype.kind]

zarr_format = zarr_array.metadata.zarr_format

if zarr_format == 2:
# TODO: Once we want to support V2, we will have to deconstruct the
# zarr_array codecs etc. and reconstruct them with create_v3_array_metadata
raise NotImplementedError("Reading Zarr V2 currently not supported.")

Check warning on line 87 in virtualizarr/readers/zarr.py

View check run for this annotation

Codecov / codecov/patch

virtualizarr/readers/zarr.py#L87

Added line #L87 was not covered by tests

elif zarr_format == 3:
return zarr_array.metadata

else:
raise NotImplementedError("Zarr format is not recognized as v2 or v3.")

Check warning on line 93 in virtualizarr/readers/zarr.py

View check run for this annotation

Codecov / codecov/patch

virtualizarr/readers/zarr.py#L93

Added line #L93 was not covered by tests


async def _construct_manifest_array(zarr_array: zarr.AsyncArray[Any], filepath: str):
array_metadata = get_metadata(zarr_array=zarr_array)

chunk_manifest = await build_chunk_manifest(zarr_array, filepath=filepath)
return ManifestArray(metadata=array_metadata, chunkmanifest=chunk_manifest)


async def _construct_manifest_group(
filepath: str,
*,
reader_options: Optional[dict] = None,
drop_variables: str | Iterable[str] | None = None,
group: str | None = None,
):
reader_options = reader_options or {}
zarr_group = await open_group_async(
filepath,
storage_options=reader_options.get("storage_options"),
path=group,
mode="r",
)

zarr_array_keys = [key async for key in zarr_group.array_keys()]

_drop_vars: list[Hashable] = [] if drop_variables is None else list(drop_variables)

zarr_arrays = await asyncio.gather(
*[zarr_group.getitem(var) for var in zarr_array_keys if var not in _drop_vars]
)

manifest_arrays = await asyncio.gather(
*[
_construct_manifest_array(zarr_array=array, filepath=filepath) # type: ignore[arg-type]
for array in zarr_arrays
]
)

manifest_dict = {
array.basename: result for array, result in zip(zarr_arrays, manifest_arrays)
}
return ManifestGroup(manifest_dict, attributes=zarr_group.attrs)


def _construct_manifest_store(
filepath: str,
*,
reader_options: Optional[dict] = None,
drop_variables: str | Iterable[str] | None = None,
group: str | None = None,
) -> ManifestStore:
import asyncio

manifest_group = asyncio.run(
_construct_manifest_group(
filepath=filepath,
group=group,
drop_variables=drop_variables,
reader_options=reader_options,
)
)
return ManifestStore(manifest_group)


class ZarrVirtualBackend(VirtualBackend):
@staticmethod
def open_virtual_dataset(
filepath: str,
group: str | None = None,
drop_variables: str | Iterable[str] | None = None,
loadable_variables: Iterable[str] | None = None,
decode_times: bool | None = None,
indexes: Mapping[str, Index] | None = None,
virtual_backend_kwargs: Optional[dict] = None,
reader_options: Optional[dict] = None,
) -> Dataset:
filepath = validate_and_normalize_path_to_uri(
filepath, fs_root=Path.cwd().as_uri()
)

manifest_store = _construct_manifest_store(
filepath=filepath,
group=group,
drop_variables=drop_variables,
reader_options=reader_options,
)

ds = manifest_store.to_virtual_dataset(
loadable_variables=loadable_variables,
decode_times=decode_times,
indexes=indexes,
)
return ds
Loading