Skip to content

ideas for improving the store APIΒ #3637

@d-v-b

Description

@d-v-b

I've encountered a few pain points in our store API, which I will demonstrate with the following code snippet:

import zarr
from zarr.core.sync import sync
from zarr.storage import MemoryStore
import json

store = MemoryStore({})
zg = zarr.create_group(store)

# read zarr.json as JSON
# this process sucks
zmeta = json.loads(sync(store.get('zarr.json')).to_bytes())

IMO something like this would be much better:

import zarr
# stores are top-level exports
store = zarr.MemoryStore({})
zg = zarr.create_group(store)
# equivalent
zmeta = store.get_json('zarr.json')
zmeta = (store / 'zarr.json').get_json()

Lack of synchronous methods for IO

Problem:
Store IO is async, with no sync methods. Users must use private API (zarr.core.sync.sync) to invoke store.get in a synchronous context like a debugger or interpreter.

Solution:
Add synchronous methods for all the async methods. Ideally we would have get and get_async but this would be breaking as get is async today. A non-breaking change could be to add get_sync. But I think the proposed get_bytes and get_json methods described below can also solve this problem.

Buffer abstraction is a stumbling block

Problem:
Stores return buffer objects, which means you have to call to_bytes() to get the bytes.

Solution:
Add a get_bytes(*, path: str = '') method to the store that synchronously retrieves the object at path, and decodes the result as bytes. I think we would also get a lot of value from a get_json method.

StorePath / Store distinction is needless

Problem:
We have both Store and StorePath classes. Two classes instead of one adds needless complexity to the codebase.

Solution:
Fold all the StorePath logic into the Store class definition, and deprecate StorePath.

The StorePath definition is very simple -- it's just a store instance, and a string, and some methods that combine these 2 things. There is no reason for two separate classes here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions