Skip to content

Planning explicit dependency on Zarr v3 #392

@sharkinsspatial

Description

@sharkinsspatial

We have a few interdependent issues which discuss the next steps for completely migrating to Zarr v3 but I thought it may be helpful to outline a rough plan of attack which consolidates all of these issues in one spot so everyone can contribute to the discussion on how to achieve this. I'll list things sequentially, but several of these may need to be approached in a single PR due to dependency conflicts.

  • Replace VirtualiZarr.ZArray with zarr ArrayMetadata #175 Replace our existing Zarray metadata representation with the new zarr-python v3 class. This change will touch a significant portion of the codebase as our serialization/deserialization logic and readers all rely on the existing metadata structure.
  • Use in-memory icechunk stores in roundtrip tests #376 Replace kerchunk in roundtrip tests with in-memory Icechunk stores. This would likely be made simpler by first updating our Zarray representation to remove some of the v3 specific logic needed when Icechunk support was introduced.
  • Switch tests to use HDF reader instead of kerchunk-based HDF5 reader #374 Switch our HDF5 roundtrip test to use the HDFVirtualBackend. Along with the previous steps, this would remove the bulk of our dependence on kerchunk and free us to depend on Zarr v3 explicitly with only a limited loss of existing functionality. There are several outstanding issues (and likely more) with the HDFVirtualBackend that were discovered and raised during the Pangeo hack day in December that will need to be addressed. I'll link to a separate issue with the plan for tackling these.
  • Vendor kerchunk readers? #377 Copy remaining kerchunk readers to Virtualizarr. To avoid losing support for netCDF3 and FITS we can temporarily copy and paste these kerchunk readers into Virtualizarr and use the dataset->kerchunk reader->virtualizarr kerchunk reader approach until dedicated Virtualizarr readers can be developed for these formats.

@TomNicholas It would be great to get your feedback on how you think these steps should be organized into PRs so that we can make manageable changes but still execute the necessary parts of the test suite. @abarciauskas-bgse and I have some availability now to start tackling #17 and I'm going to begin working to stabilize HDFVirtualBackend so that it is hopefully robust enough for the majority of current use cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    dependenciesUpdates a dependencyzarr-pythonRelevant to zarr-python upstream

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions