Skip to content

Commit 00695ad

Browse files
committed
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python into fix/remote-store-empty-speedup-walk
2 parents d3e8733 + b8f6cb9 commit 00695ad

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+2498
-374
lines changed

.pre-commit-config.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,3 +49,7 @@ repos:
4949
hooks:
5050
- id: rst-directive-colons
5151
- id: rst-inline-touching-normal
52+
- repo: https://github.com/numpy/numpydoc
53+
rev: v1.8.0
54+
hooks:
55+
- id: numpydoc-validation

docs/consolidated_metadata.rst

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
Consolidated Metadata
2+
=====================
3+
4+
Zarr-Python implements the `Consolidated Metadata_` extension to the Zarr Spec.
5+
Consolidated metadata can reduce the time needed to load the metadata for an
6+
entire hierarchy, especially when the metadata is being served over a network.
7+
Consolidated metadata essentially stores all the metadata for a hierarchy in the
8+
metadata of the root Group.
9+
10+
Usage
11+
-----
12+
13+
If consolidated metadata is present in a Zarr Group's metadata then it is used
14+
by default. The initial read to open the group will need to communicate with
15+
the store (reading from a file for a :class:`zarr.store.LocalStore`, making a
16+
network request for a :class:`zarr.store.RemoteStore`). After that, any subsequent
17+
metadata reads get child Group or Array nodes will *not* require reads from the store.
18+
19+
In Python, the consolidated metadata is available on the ``.consolidated_metadata``
20+
attribute of the ``GroupMetadata`` object.
21+
22+
.. code-block:: python
23+
24+
>>> import zarr
25+
>>> store = zarr.store.MemoryStore({}, mode="w")
26+
>>> group = zarr.open_group(store=store)
27+
>>> group.create_array(shape=(1,), name="a")
28+
>>> group.create_array(shape=(2, 2), name="b")
29+
>>> group.create_array(shape=(3, 3, 3), name="c")
30+
>>> zarr.consolidate_metadata(store)
31+
32+
If we open that group, the Group's metadata has a :class:`zarr.ConsolidatedMetadata`
33+
that can be used.
34+
35+
.. code-block:: python
36+
37+
>>> consolidated = zarr.open_group(store=store)
38+
>>> consolidated.metadata.consolidated_metadata.metadata
39+
{'b': ArrayV3Metadata(shape=(2, 2), fill_value=np.float64(0.0), ...),
40+
'a': ArrayV3Metadata(shape=(1,), fill_value=np.float64(0.0), ...),
41+
'c': ArrayV3Metadata(shape=(3, 3, 3), fill_value=np.float64(0.0), ...)}
42+
43+
Operations on the group to get children automatically use the consolidated metadata.
44+
45+
.. code-block:: python
46+
47+
>>> consolidated["a"] # no read / HTTP request to the Store is required
48+
<Array memory://.../a shape=(1,) dtype=float64>
49+
50+
With nested groups, the consolidated metadata is available on the children, recursively.
51+
52+
... code-block:: python
53+
54+
>>> child = group.create_group("child", attributes={"kind": "child"})
55+
>>> grandchild = child.create_group("child", attributes={"kind": "grandchild"})
56+
>>> consolidated = zarr.consolidate_metadata(store)
57+
58+
>>> consolidated["child"].metadata.consolidated_metadata
59+
ConsolidatedMetadata(metadata={'child': GroupMetadata(attributes={'kind': 'grandchild'}, zarr_format=3, )}, ...)
60+
61+
Synchronization and Concurrency
62+
-------------------------------
63+
64+
Consolidated metadata is intended for read-heavy use cases on slowly changing
65+
hierarchies. For hierarchies where new nodes are constantly being added,
66+
removed, or modified, consolidated metadata may not be desirable.
67+
68+
1. It will add some overhead to each update operation, since the metadata
69+
would need to be re-consolidated to keep it in sync with the store.
70+
2. Readers using consolidated metadata will regularly see a "past" version
71+
of the metadata, at the time they read the root node with its consolidated
72+
metadata.
73+
74+
.. _Consolidated Metadata: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#consolidated-metadata

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Zarr-Python
1010

1111
getting_started
1212
tutorial
13+
consolidated_metadata
1314
api/index
1415
spec
1516
release

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,3 +319,7 @@ ignore = [
319319
"PC111", # fix Python code in documentation - enable later
320320
"PC180", # for JavaScript - not interested
321321
]
322+
323+
[tool.numpydoc_validation]
324+
# See https://numpydoc.readthedocs.io/en/latest/validation.html#built-in-validation-checks for list of checks
325+
checks = ["GL06", "GL07", "GL10", "PR03", "PR05", "PR06"]

src/zarr/abc/codec.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@
2020
from zarr.core.indexing import SelectorTuple
2121

2222
__all__ = [
23-
"BaseCodec",
2423
"ArrayArrayCodec",
2524
"ArrayBytesCodec",
2625
"ArrayBytesCodecPartialDecodeMixin",
2726
"ArrayBytesCodecPartialEncodeMixin",
27+
"BaseCodec",
2828
"BytesBytesCodec",
2929
"CodecInput",
3030
"CodecOutput",

src/zarr/abc/store.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ class Store(ABC):
4343
_mode: AccessMode
4444
_is_open: bool
4545

46-
def __init__(self, mode: AccessModeLiteral = "r", *args: Any, **kwargs: Any) -> None:
46+
def __init__(self, *args: Any, mode: AccessModeLiteral = "r", **kwargs: Any) -> None:
4747
self._is_open = False
4848
self._mode = AccessMode.from_literal(mode)
4949

@@ -69,13 +69,10 @@ def __exit__(
6969
async def _open(self) -> None:
7070
if self._is_open:
7171
raise ValueError("store is already open")
72-
if not await self.empty():
73-
if self.mode.update or self.mode.readonly:
74-
pass
75-
elif self.mode.overwrite:
76-
await self.clear()
77-
else:
78-
raise FileExistsError("Store already exists")
72+
if self.mode.str == "w":
73+
await self.clear()
74+
elif self.mode.str == "w-" and not await self.empty():
75+
raise FileExistsError("Store already exists")
7976
self._is_open = True
8077

8178
async def _ensure_open(self) -> None:

0 commit comments

Comments
 (0)