Skip to content

Allowing to add / Adding a sharding spec #127

@jstriebel

Description

@jstriebel

As discussed in zarr-developers/zarr-python#877, it would be great to have sharding support in zarr. The current prototype is based around translating chunk-access into partial-read/writes of shards, where shards are one unit in a store. This needs some translation logic between an array chunk and a storage unit (plus making sure that this can happen efficiently, e.g. by bundling multiple chunks into one access-request). Atm there are prototypes for this logic to be either

I think it would be great to extend the specification to allow sharding, by either

  • adding a specific sharding spec and terminology, or
  • adding a general "translation" interface, with the specific extension for sharding.

The second approach would hopefully lead to a standard way how to include such extensions. They still need broad compatibility and support across implementations, but having a common terminology and specs for similar extensions might be worthwhile. Related issues seem to be #62, #115, #82, #49, #76.

I'm not quite familiar with the process around the specs. Besides opinions about the approaches and terminology, I'd be happy about practical next steps to update the specs for sharding support.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions