-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
The NISAR test currently fails because it has an attribute value of inf (the float) which leads to ValueError: Out of range float values are not JSON compliant: inf when trying to write to either Icechunk or Kerchunk. I wonder how we should handle cases on non-JSON serializable attributes with Zarr V3? Some options:
- Add a parameter to
to_icechunkandto_kerchunkthat provides the user the option to raise an error, drop the attribute, or cast to a string - Catch the upstream error an raise a more informative error about which variable / attribute is causing the issue
- Defer to parsers and provide documentation about the requirement for objects to be JSON serializable
Relevant Zarr spec discussion: zarr-developers/zarr-specs#351
It's slow to debug over the network, so a recommended approach for an MVCE is to download https://nisar.asf.earthdatacloud.nasa.gov/NISAR-SAMPLE-DATA/GCOV/ALOS1_Rosamond_20081012/NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5 and reproduce locally:
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "earthaccess",
# "obstore",
# "virtualizarr[hdf, icechunk]",
# "xarray[io]",
# "zarr>=3.1.3"
# ]
# ///
import xarray as xr
from obstore.store import LocalStore
from virtualizarr import open_virtual_dataset
from virtualizarr.parsers import HDFParser
from virtualizarr.registry import ObjectStoreRegistry
from icechunk import Repository, Storage, local_filesystem_storage, RepositoryConfig, VirtualChunkContainer, local_filesystem_store
def main():
data_dir = "/Users/max/Documents/Code/zarr-developers/VirtualiZarr/.data/"
file = "NISAR_L2_PR_GCOV_001_005_A_219_4020_SHNA_A_20081012T060910_20081012T060926_P01101_F_N_J_001.h5"
config = RepositoryConfig.default()
config.set_virtual_chunk_container(
VirtualChunkContainer(
url_prefix=f"file://{data_dir}",
store=local_filesystem_store(data_dir),
),
)
storage = Storage.new_in_memory()
# create an in-memory icechunk repository that includes the virtual chunk containers
repo = Repository.create(storage, config)
session = repo.writable_session("main")
hdf_group = "science/LSAR/GCOV/grids/frequencyA"
store = LocalStore()
registry = ObjectStoreRegistry()
registry.register("file://", store)
drop_variables = ["listOfCovarianceTerms", "listOfPolarizations"]
parser = HDFParser(group=hdf_group, drop_variables=drop_variables)
with (
xr.open_dataset(
f"{data_dir}{file}",
engine="h5netcdf",
group=hdf_group,
drop_variables=drop_variables,
phony_dims="access",
) as dsXR,
open_virtual_dataset(
url=f"file://{data_dir}{file}",
registry=registry,
parser=parser,
) as vds,
):
vds.vz.to_icechunk(session.store)
with xr.open_zarr(session.store, zarr_format=3, consolidated=False) as dsV:
xr.testing.assert_equal(dsXR, dsV)
if __name__ == "__main__":
main()Metadata
Metadata
Assignees
Labels
No labels