Skip to content

Build virtual Zarr store using Xarray's dataset.to_zarr(region="...") model #308

@maxrjones

Description

@maxrjones

I'm wondering how we can build virtualization pipelines as a map rather than a map-reduce process. The map paradigm would follow the structure below from Earthmover's excellent blog post on serverless datacube pipelines, where the virtual dataset from each worker would get written directly to an Icechunk Virtual Store rather than being transferred back to the coordination node for a concatenation step. Similar to their post, the parallelization between workers during virtualization could leverage a serverless approach like lithops or coiled. Dask concurrency could speed up virtualization within each worker. I think the main feature needed for this to work is an analogue to xr.Dataset.to_zarr(region="..."), complementary to xr.Dataset.to_zarr(append_dim="...")(documented in https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html) (xref #21, #272).

Image

I tried to check that this feature request doesn't already exist, but apologies if I missed something.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions