-
Notifications
You must be signed in to change notification settings - Fork 49
Description
Hi all, I'm relatively new to using virtualizarr but have been developing tools using Kerchunk for some time, specifically a package around large-scale conversions in parallel for thousands of datasets in our data archives.
I'm attempting to use the Virtualizarr library to concatenate some NetCDF4 data into a virtual dataset, then write out as an Icechunk store to disk. My issue is that it seems Icechunk requires the new zarr v3 pre-release, but Kerchunk (used to create the virtual dataset) needs Zarr < 3. I've so far been unable to resolve this dependency issue. Any suggestions for how to go about solving this would be appreciated, thanks!
My example code:
from virtualizarr import open_virtual_dataset
vds = [open_virtual_dataset(f, indexes={}) for f in files]
import xarray as xr
combined_vds = xr.concat(vds, dim='time', coords='minimal', compat='override')
from icechunk import IcechunkStore, StorageConfig, StoreConfig, VirtualRefConfig
storage = StorageConfig.filesystem(str('combined'))
store = IcechunkStore.create(storage=storage, mode="w", config=StoreConfig(
virtual_ref_config=VirtualRefConfig.s3_anonymous(region='us-east-1'),
))
combined_vds.virtualize.to_icechunk(store)
I either get an issue importing kerchunk (if I uninstall that to reinstall the zarr v3 pre-release) or an issue with zarr when trying to create the Icechunk store.