-
-
Notifications
You must be signed in to change notification settings - Fork 373
Description
I've encountered a few pain points in our store API, which I will demonstrate with the following code snippet:
import zarr
from zarr.core.sync import sync
from zarr.storage import MemoryStore
import json
store = MemoryStore({})
zg = zarr.create_group(store)
# read zarr.json as JSON
# this process sucks
zmeta = json.loads(sync(store.get('zarr.json')).to_bytes())IMO something like this would be much better:
import zarr
# stores are top-level exports
store = zarr.MemoryStore({})
zg = zarr.create_group(store)
# equivalent
zmeta = store.get_json('zarr.json')
zmeta = (store / 'zarr.json').get_json()Lack of synchronous methods for IO
Problem:
Store IO is async, with no sync methods. Users must use private API (zarr.core.sync.sync) to invoke store.get in a synchronous context like a debugger or interpreter.
Solution:
Add synchronous methods for all the async methods. Ideally we would have get and get_async but this would be breaking as get is async today. A non-breaking change could be to add get_sync. But I think the proposed get_bytes and get_json methods described below can also solve this problem.
Buffer abstraction is a stumbling block
Problem:
Stores return buffer objects, which means you have to call to_bytes() to get the bytes.
Solution:
Add a get_bytes(*, path: str = '') method to the store that synchronously retrieves the object at path, and decodes the result as bytes. I think we would also get a lot of value from a get_json method.
StorePath / Store distinction is needless
Problem:
We have both Store and StorePath classes. Two classes instead of one adds needless complexity to the codebase.
Solution:
Fold all the StorePath logic into the Store class definition, and deprecate StorePath.
The StorePath definition is very simple -- it's just a store instance, and a string, and some methods that combine these 2 things. There is no reason for two separate classes here.