Build virtual Zarr store using Xarray's dataset.to_zarr(region="...") model

I'm wondering how we can build virtualization pipelines as a map rather than a map-reduce process. The map paradigm would follow the structure below from [Earthmover's excellent blog post on serverless datacube pipelines](https://earthmover.io/blog/serverless-datacube-pipeline), where the virtual dataset from each worker would get written directly to an Icechunk Virtual Store rather than being transferred back to the coordination node for a concatenation step. Similar to their post, the parallelization between workers during virtualization could leverage a serverless approach like lithops or coiled. Dask concurrency could speed up virtualization within each worker. I think the main feature needed for this to work is an analogue to `xr.Dataset.to_zarr(region="...")`, complementary to `xr.Dataset.to_zarr(append_dim="...")`(documented in https://docs.xarray.dev/en/latest/generated/xarray.Dataset.to_zarr.html)  (xref https://github.com/zarr-developers/VirtualiZarr/issues/21, https://github.com/zarr-developers/VirtualiZarr/pull/272).

![Image](https://github.com/user-attachments/assets/4886d738-dfba-4c3a-83f3-24640aef2607)

I tried to check that this feature request doesn't already exist, but apologies if I missed something.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Build virtual Zarr store using Xarray's dataset.to_zarr(region="...") model #308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Build virtual Zarr store using Xarray's dataset.to_zarr(region="...") model #308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions