Skip to content

Dependency Issue for Kerchunk -> Icechunk via Virtualizarr #321

@dwest77a

Description

@dwest77a

Hi all, I'm relatively new to using virtualizarr but have been developing tools using Kerchunk for some time, specifically a package around large-scale conversions in parallel for thousands of datasets in our data archives.

I'm attempting to use the Virtualizarr library to concatenate some NetCDF4 data into a virtual dataset, then write out as an Icechunk store to disk. My issue is that it seems Icechunk requires the new zarr v3 pre-release, but Kerchunk (used to create the virtual dataset) needs Zarr < 3. I've so far been unable to resolve this dependency issue. Any suggestions for how to go about solving this would be appreciated, thanks!

My example code:

from virtualizarr import open_virtual_dataset
vds = [open_virtual_dataset(f, indexes={}) for f in files]

import xarray as xr
combined_vds = xr.concat(vds, dim='time', coords='minimal', compat='override')

from icechunk import IcechunkStore, StorageConfig, StoreConfig, VirtualRefConfig
storage = StorageConfig.filesystem(str('combined'))
store = IcechunkStore.create(storage=storage, mode="w", config=StoreConfig(
    virtual_ref_config=VirtualRefConfig.s3_anonymous(region='us-east-1'),
))

combined_vds.virtualize.to_icechunk(store)

I either get an issue importing kerchunk (if I uninstall that to reinstall the zarr v3 pre-release) or an issue with zarr when trying to create the Icechunk store.

Metadata

Metadata

Assignees

No one assigned

    Labels

    KerchunkRelating to the kerchunk library / specification itselfupstream issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions