Skip to content

Support Zarr v3 formatΒ #419

@tomwhite

Description

@tomwhite

In #288 we allowed bio2zarr to use zarr-python v2 or v3, while using the Zarr v2 format in both cases. This issue is to support using the v3 format.

I have written a draft of what changes are needed in VCF Zarr in sgkit-dev/vcf-zarr-spec#42.

The main changes that affect bio2zarr are:

  • String types are encoded differently (this may also interact with NumPy 2's new StringDType).
  • Dimension names are no longer written as a special _ARRAY_DIMENSIONS attribute, but are stored in the Zarr metadata as dimension_names.
  • Zarr filters and compressors have been replaced with codecs.
  • The directory layout has changed with chunks now being stored under a c prefix. This would affect some low-level bio2zarr code e.g.
    # This is Zarr v2 specific. Chunks in v3 with start with "c" prefix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions