Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
import pytest
import xarray as xr
import zarr
from obspec_utils import ObjectStoreRegistry
from obstore.store import LocalStore
from xarray.core.variable import Variable

# Local imports
from virtualizarr.manifests import ChunkManifest, ManifestArray
from virtualizarr.manifests.manifest import join
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.utils import ceildiv


Expand Down
5 changes: 2 additions & 3 deletions docs/api/developer.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,8 @@ See the page on data structures for more information.

## Registry

::: virtualizarr.registry.Url
[Urls][virtualizarr.registry.Url] should be parseable by [urllib.parse.urlparse][].
::: virtualizarr.registry.ObjectStoreRegistry
... note
`virtualizarr.registry.ObjectStoreRegistry has been deprecated. Please use [obspec_utils.ObjectStoreRegistry][] instead.

## Array API

Expand Down
1 change: 0 additions & 1 deletion docs/api/serialization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,5 +2,4 @@

::: virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_icechunk
::: virtualizarr.accessor.VirtualiZarrDatasetAccessor.to_kerchunk

::: virtualizarr.accessor.VirtualiZarrDataTreeAccessor.to_icechunk
10 changes: 6 additions & 4 deletions docs/custom_parsers.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,12 @@ This is advanced material intended for 3rd-party developers, and assumes you hav

## What is a VirtualiZarr parser?

All VirtualiZarr parsers are simply callables that accept the URL pointing to a data source and a [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that may contain instantiated [ObjectStores][obstore.store.ObjectStore] that can read from that URL, and return an instance of the [`virtualizarr.manifests.ManifestStore`][] class containing information about the contents of the data source.
All VirtualiZarr parsers are simply callables that accept the URL pointing to a data source and a [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that may contain instantiated [ObjectStores][obstore.store.ObjectStore] that can read from that URL, and return an instance of the [`virtualizarr.manifests.ManifestStore`][] class containing information about the contents of the data source.

```python
from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.registry import ObjectStoreRegistry


def custom_parser(url: str, registry: ObjectStoreRegistry) -> ManifestStore:
Expand Down Expand Up @@ -234,10 +235,11 @@ For example we could test the ability of VirtualiZarr's in-built [`HDFParser`][v

```python
import xarray.testing as xrt
from obspec_utils import ObjectStoreRegistry
from obstore.store import LocalStore

from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obstore.store import LocalStore


project_directory = "/Users/user/my-project"
project_url = f"file://{project_directory}"
Expand Down
5 changes: 3 additions & 2 deletions docs/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,11 @@ In general once the Icechunk specification reaches a stable v1.0, we would recom
No - you can simply open the Kerchunk-formatted references you already have into VirtualiZarr directly. Then you can manipulate them, or re-save them into a new format, such as [Icechunk](https://icechunk.io/):

```python
from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.parsers import KerchunkJSONParser, KerchunkParquetParser
from obstore.store import LocalStore

project_dir="/Users/user/project-dir"
project_url=f"file://{project_dir}"
Expand Down
5 changes: 3 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ First, import the necessary functions and classes:
import icechunk
import obstore

from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
```

Zarr can emit a lot of warnings about Numcodecs not being including in the Zarr version 3
Expand Down Expand Up @@ -67,7 +68,7 @@ path = "NEX-GDDP-CMIP6/ACCESS-CM2/ssp126/r1i1p1f1/tasmax/tasmax_day_ACCESS-CM2_s
store = obstore.store.from_url(bucket, region="us-west-2", skip_signature=True)
```

We also need to create an [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that
We also need to create an [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that
maps the URL structure to the ObjectStore.

```python exec="on" source="above" session="homepage"
Expand Down
14 changes: 7 additions & 7 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,19 @@ vds = open_virtual_dataset("data1.nc")
```

To provide a more extensible and reliable API, VirtualiZarr V2 requires more explicit configuration by the user.
You now must pass in a valid [Parser][virtualizarr.parsers.typing.Parser] and a [virtualizarr.registry.ObjectStoreRegistry][] to [virtualizarr.open_virtual_dataset][].
You now must pass in a valid [Parser][virtualizarr.parsers.typing.Parser] and a [obspec_utils.ObjectStoreRegistry][] to [virtualizarr.open_virtual_dataset][].
This change adds a bit more verbosity, but is intended to make virtualizing datasets more robust. It is most common for the
[ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] to contain one or more [ObjectStores][obstore.store.ObjectStore]
for reading the original data, but some parsers may accept an empty [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry].
[ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] to contain one or more [ObjectStores][obstore.store.ObjectStore]
for reading the original data, but some parsers may accept an empty [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry].

=== "S3 Store"

```python exec="on" source="material-block" session="migration" result="code"
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "nex-gddp-cmip6"
store = S3Store(
Expand All @@ -57,10 +57,10 @@ for reading the original data, but some parsers may accept an empty [ObjectStore


from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

from pathlib import Path

Expand Down Expand Up @@ -116,15 +116,15 @@ vds.vz.to_icechunk(icechunk_store)
In Virtualizarr V1 if you wanted to access the underlying chunks of a dataset, you first had to write the reference to disk. From there you could read those references back into Xarray and access the chunks like you would with a normal Xarray dataset.

In V2 you can now **directly read the chunks from a Parser into Xarray without writing them to disk first**. 🤯
Since each `Parser` is now responsible for creating a [ManifestStore][virtualizarr.manifests.ManifestStore] and the [ManifestStore][virtualizarr.manifests.ManifestStore] has the ability to fetch data through any [ObjectStore][obstore.store.ObjectStore] in the [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry]. You
Since each `Parser` is now responsible for creating a [ManifestStore][virtualizarr.manifests.ManifestStore] and the [ManifestStore][virtualizarr.manifests.ManifestStore] has the ability to fetch data through any [ObjectStore][obstore.store.ObjectStore] in the [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry]. You
can load data using the [ManifestStore][virtualizarr.manifests.ManifestStore] via either Zarr or Xarray. Here's an example using Xarray:

```python exec="on" source="material-block" session="migration" result="code"
import xarray as xr
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "nex-gddp-cmip6"
store = S3Store(
Expand Down
14 changes: 7 additions & 7 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ that can access your data. Available ObjectStores are described in the [obstore

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obspec_utils import ObjectStoreRegistry

bucket = "s3://nex-gddp-cmip6"
path = "NEX-GDDP-CMIP6/ACCESS-CM2/ssp126/r1i1p1f1/tasmax/tasmax_day_ACCESS-CM2_ssp126_r1i1p1f1_gn_2015_v2.0.nc"
Expand All @@ -42,7 +42,7 @@ that can access your data. Available ObjectStores are described in the [obstore

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from obspec_utils import ObjectStoreRegistry

bucket = "gs://data-bucket"
path = "file-path/data.nc"
Expand All @@ -55,13 +55,13 @@ that can access your data. Available ObjectStores are described in the [obstore
=== "Azure"

```python

import xarray as xr
from obspec_utils import ObjectStoreRegistry
from obstore.store import from_url


from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

bucket = "abfs://data-container"
path = "file-path/data.nc"
Expand All @@ -77,10 +77,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import from_url
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

# This examples uses a NetCDF file of CMIP6 from ESGF.
bucket = 'https://esgf-data.ucar.edu'
Expand All @@ -96,10 +96,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import S3Store
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

endpoint = "https://nyu1.osn.mghpcc.org"
access_key_id = "<access_key_id>"
Expand All @@ -124,10 +124,10 @@ that can access your data. Available ObjectStores are described in the [obstore

import xarray as xr
from obstore.store import LocalStore
from obspec_utils import ObjectStoreRegistry

from virtualizarr import open_virtual_dataset, open_virtual_mfdataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry

from pathlib import Path

Expand Down
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@ plugins:
- https://numpy.org/doc/stable/objects.inv
- https://numcodecs.readthedocs.io/en/stable/objects.inv
- https://zarr.readthedocs.io/en/stable/objects.inv
- https://obspec-utils.readthedocs.io/en/stable/objects.inv
- https://developmentseed.org/obstore/latest/objects.inv
- https://filesystem-spec.readthedocs.io/en/latest/objects.inv
- https://requests.readthedocs.io/en/latest/objects.inv
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ dependencies = [
"packaging",
"zarr>=3.1.0",
"obstore>=0.5.1",
"obspec_utils>=0.4.0",
]

# Dependency sets under optional-dependencies are available via PyPI
Expand Down
4 changes: 2 additions & 2 deletions virtualizarr/manifests/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from typing import TYPE_CHECKING, Literal, TypeAlias
from urllib.parse import urlparse

from obspec_utils import ObjectStoreRegistry
from zarr.abc.store import (
ByteRequest,
OffsetByteRequest,
Expand All @@ -18,7 +19,6 @@
from virtualizarr.manifests.array import ManifestArray
from virtualizarr.manifests.group import ManifestGroup
from virtualizarr.manifests.utils import parse_manifest_index
from virtualizarr.registry import ObjectStoreRegistry

if TYPE_CHECKING:
from obstore.store import (
Expand Down Expand Up @@ -93,7 +93,7 @@ class ManifestStore(Store):
Root group of the store.
Contains group metadata, [ManifestArrays][virtualizarr.manifests.ManifestArray], and any subgroups.
registry : ObjectStoreRegistry
[ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] that maps the URL scheme and netloc to [ObjectStore][obstore.store.ObjectStore] instances,
[ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] that maps the URL scheme and netloc to [ObjectStore][obstore.store.ObjectStore] instances,
allowing ManifestStores to read from different ObjectStore instances.

Warnings
Expand Down
5 changes: 2 additions & 3 deletions virtualizarr/parsers/dmrpp.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from xml.etree import ElementTree as ET

import numpy as np
from obspec_utils import ObjectStoreRegistry, ObstoreReader
from obstore.store import ObjectStore

from virtualizarr.manifests import (
Expand All @@ -15,9 +16,7 @@
)
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.parsers.utils import encode_cf_fill_value
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types import ChunkKey
from virtualizarr.utils import ObstoreReader


class DMRPPParser:
Expand Down Expand Up @@ -54,7 +53,7 @@ def __call__(
url
The URL of the input DMR++ file (e.g., "s3://bucket/file.dmrpp").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
5 changes: 3 additions & 2 deletions virtualizarr/parsers/fits.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from pathlib import Path
from typing import Iterable, Optional

from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types.kerchunk import KerchunkStoreRefs


Expand Down Expand Up @@ -45,7 +46,7 @@ def __call__(
url
The URL of the input FITS file (e.g., "s3://bucket/file.fits").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
6 changes: 3 additions & 3 deletions virtualizarr/parsers/hdf/hdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
)

import numpy as np
from obspec_utils import ObjectStoreRegistry, ObstoreReader

from virtualizarr.codecs import zarr_codec_config_to_v3
from virtualizarr.manifests import (
Expand All @@ -20,9 +21,8 @@
from virtualizarr.manifests.utils import create_v3_array_metadata
from virtualizarr.parsers.hdf.filters import codecs_from_dataset
from virtualizarr.parsers.utils import encode_cf_fill_value
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types import ChunkKey
from virtualizarr.utils import ObstoreReader, soft_import
from virtualizarr.utils import soft_import

h5py = soft_import("h5py", "reading hdf files", strict=False)

Expand Down Expand Up @@ -169,7 +169,7 @@ def __call__(
url
The URL of the input HDF5/NetCDF4 file (e.g., `"s3://bucket/store.zarr"`).
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
4 changes: 2 additions & 2 deletions virtualizarr/parsers/kerchunk/json.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
from collections.abc import Iterable

import ujson
from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry


class KerchunkJSONParser:
Expand Down Expand Up @@ -46,7 +46,7 @@ def __call__(
url
The URL of the input Kerchunk JSON (e.g., "s3://bucket/kerchunk.json").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
5 changes: 3 additions & 2 deletions virtualizarr/parsers/kerchunk/parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,10 @@
from dataclasses import dataclass, field
from typing import TYPE_CHECKING

from obspec_utils import ObjectStoreRegistry

from virtualizarr.manifests import ManifestStore
from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs
from virtualizarr.registry import ObjectStoreRegistry
from virtualizarr.types.kerchunk import (
KerchunkStoreRefs,
)
Expand Down Expand Up @@ -68,7 +69,7 @@ def __call__(
url
The URL of the input parquet directory (e.g., "s3://bucket/my-kerchunk-references.parq").
registry
An [ObjectStoreRegistry][virtualizarr.registry.ObjectStoreRegistry] for resolving urls and reading data.
An [ObjectStoreRegistry][obspec_utils.ObjectStoreRegistry] for resolving urls and reading data.

Returns
-------
Expand Down
Loading
Loading