Skip to content

Commit 731c22a

Browse files
committed
Merge branch 'v3' of https://github.com/zarr-developers/zarr-python into doc/storage
2 parents f2137fb + cef4552 commit 731c22a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+3171
-526
lines changed

.pre-commit-config.yaml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ default_language_version:
77
python: python3
88
repos:
99
- repo: https://github.com/astral-sh/ruff-pre-commit
10-
rev: v0.6.8
10+
rev: v0.6.9
1111
hooks:
1212
- id: ruff
1313
args: ["--fix", "--show-fixes"]
@@ -18,7 +18,7 @@ repos:
1818
- id: codespell
1919
args: ["-L", "ba,ihs,kake,nd,noe,nwo,te,fo,zar", "-S", "fixture"]
2020
- repo: https://github.com/pre-commit/pre-commit-hooks
21-
rev: v4.6.0
21+
rev: v5.0.0
2222
hooks:
2323
- id: check-yaml
2424
- repo: https://github.com/pre-commit/mirrors-mypy
@@ -49,3 +49,7 @@ repos:
4949
hooks:
5050
- id: rst-directive-colons
5151
- id: rst-inline-touching-normal
52+
- repo: https://github.com/numpy/numpydoc
53+
rev: v1.8.0
54+
hooks:
55+
- id: numpydoc-validation

docs/consolidated_metadata.rst

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
Consolidated Metadata
2+
=====================
3+
4+
Zarr-Python implements the `Consolidated Metadata_` extension to the Zarr Spec.
5+
Consolidated metadata can reduce the time needed to load the metadata for an
6+
entire hierarchy, especially when the metadata is being served over a network.
7+
Consolidated metadata essentially stores all the metadata for a hierarchy in the
8+
metadata of the root Group.
9+
10+
Usage
11+
-----
12+
13+
If consolidated metadata is present in a Zarr Group's metadata then it is used
14+
by default. The initial read to open the group will need to communicate with
15+
the store (reading from a file for a :class:`zarr.store.LocalStore`, making a
16+
network request for a :class:`zarr.store.RemoteStore`). After that, any subsequent
17+
metadata reads get child Group or Array nodes will *not* require reads from the store.
18+
19+
In Python, the consolidated metadata is available on the ``.consolidated_metadata``
20+
attribute of the ``GroupMetadata`` object.
21+
22+
.. code-block:: python
23+
24+
>>> import zarr
25+
>>> store = zarr.store.MemoryStore({}, mode="w")
26+
>>> group = zarr.open_group(store=store)
27+
>>> group.create_array(shape=(1,), name="a")
28+
>>> group.create_array(shape=(2, 2), name="b")
29+
>>> group.create_array(shape=(3, 3, 3), name="c")
30+
>>> zarr.consolidate_metadata(store)
31+
32+
If we open that group, the Group's metadata has a :class:`zarr.ConsolidatedMetadata`
33+
that can be used.
34+
35+
.. code-block:: python
36+
37+
>>> consolidated = zarr.open_group(store=store)
38+
>>> consolidated.metadata.consolidated_metadata.metadata
39+
{'b': ArrayV3Metadata(shape=(2, 2), fill_value=np.float64(0.0), ...),
40+
'a': ArrayV3Metadata(shape=(1,), fill_value=np.float64(0.0), ...),
41+
'c': ArrayV3Metadata(shape=(3, 3, 3), fill_value=np.float64(0.0), ...)}
42+
43+
Operations on the group to get children automatically use the consolidated metadata.
44+
45+
.. code-block:: python
46+
47+
>>> consolidated["a"] # no read / HTTP request to the Store is required
48+
<Array memory://.../a shape=(1,) dtype=float64>
49+
50+
With nested groups, the consolidated metadata is available on the children, recursively.
51+
52+
... code-block:: python
53+
54+
>>> child = group.create_group("child", attributes={"kind": "child"})
55+
>>> grandchild = child.create_group("child", attributes={"kind": "grandchild"})
56+
>>> consolidated = zarr.consolidate_metadata(store)
57+
58+
>>> consolidated["child"].metadata.consolidated_metadata
59+
ConsolidatedMetadata(metadata={'child': GroupMetadata(attributes={'kind': 'grandchild'}, zarr_format=3, )}, ...)
60+
61+
Synchronization and Concurrency
62+
-------------------------------
63+
64+
Consolidated metadata is intended for read-heavy use cases on slowly changing
65+
hierarchies. For hierarchies where new nodes are constantly being added,
66+
removed, or modified, consolidated metadata may not be desirable.
67+
68+
1. It will add some overhead to each update operation, since the metadata
69+
would need to be re-consolidated to keep it in sync with the store.
70+
2. Readers using consolidated metadata will regularly see a "past" version
71+
of the metadata, at the time they read the root node with its consolidated
72+
metadata.
73+
74+
.. _Consolidated Metadata: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#consolidated-metadata

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Zarr-Python
1010

1111
getting_started
1212
tutorial
13+
consolidated_metadata
1314
api/index
1415
spec
1516
release

pyproject.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -319,3 +319,7 @@ ignore = [
319319
"PC111", # fix Python code in documentation - enable later
320320
"PC180", # for JavaScript - not interested
321321
]
322+
323+
[tool.numpydoc_validation]
324+
# See https://numpydoc.readthedocs.io/en/latest/validation.html#built-in-validation-checks for list of checks
325+
checks = ["GL06", "GL07", "GL10", "PR03", "PR05", "PR06"]

src/zarr/abc/codec.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@
2020
from zarr.core.indexing import SelectorTuple
2121

2222
__all__ = [
23-
"BaseCodec",
2423
"ArrayArrayCodec",
2524
"ArrayBytesCodec",
2625
"ArrayBytesCodecPartialDecodeMixin",
2726
"ArrayBytesCodecPartialEncodeMixin",
27+
"BaseCodec",
2828
"BytesBytesCodec",
2929
"CodecInput",
3030
"CodecOutput",

src/zarr/abc/metadata.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,6 @@ def to_dict(self) -> dict[str, JSON]:
2222
are instances of `Metadata`. Sequences of `Metadata` are similarly recursed into, and
2323
the output of that recursion is collected in a list.
2424
"""
25-
...
2625
out_dict = {}
2726
for field in fields(self):
2827
key = field.name

src/zarr/abc/store.py

Lines changed: 11 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -62,19 +62,12 @@ def from_literal(cls, mode: AccessModeLiteral) -> Self:
6262
class Store(ABC):
6363
"""
6464
Abstract base class for Zarr stores.
65-
66-
Attributes
67-
----------
68-
_mode : AccessMode
69-
Access mode flags.
70-
_is_open : bool
71-
Whether the store is open.
7265
"""
7366

7467
_mode: AccessMode
7568
_is_open: bool
7669

77-
def __init__(self, mode: AccessModeLiteral = "r", *args: Any, **kwargs: Any) -> None:
70+
def __init__(self, *args: Any, mode: AccessModeLiteral = "r", **kwargs: Any) -> None:
7871
self._is_open = False
7972
self._mode = AccessMode.from_literal(mode)
8073

@@ -92,7 +85,7 @@ async def open(cls, *args: Any, **kwargs: Any) -> Self:
9285
9386
Returns
9487
-------
95-
store
88+
Store
9689
The opened store instance.
9790
"""
9891
store = cls(*args, **kwargs)
@@ -116,26 +109,23 @@ async def _open(self) -> None:
116109
"""
117110
Open the store.
118111
119-
Notes
120-
-----
121-
* When `mode='w'` and the store already exists, it will be cleared.
122-
123112
Raises
124113
------
125114
ValueError
126115
If the store is already open.
127116
FileExistsError
128-
If `mode='w-'` and the store already exists.
117+
If ``mode='w-'`` and the store already exists.
118+
119+
Notes
120+
-----
121+
* When ``mode='w'`` and the store already exists, it will be cleared.
129122
"""
130123
if self._is_open:
131124
raise ValueError("store is already open")
132-
if not await self.empty():
133-
if self.mode.update or self.mode.readonly:
134-
pass
135-
elif self.mode.overwrite:
136-
await self.clear()
137-
else:
138-
raise FileExistsError("Store already exists")
125+
if self.mode.str == "w":
126+
await self.clear()
127+
elif self.mode.str == "w-" and not await self.empty():
128+
raise FileExistsError("Store already exists")
139129
self._is_open = True
140130

141131
async def _ensure_open(self) -> None:
@@ -161,7 +151,6 @@ async def clear(self) -> None:
161151
Clear the store.
162152
163153
Remove all keys and values from the store.
164-
165154
"""
166155
...
167156

@@ -430,7 +419,6 @@ async def set_or_delete(byte_setter: ByteSetter, value: Buffer | None) -> None:
430419
Notes
431420
-----
432421
If value is None, the key will be deleted.
433-
434422
"""
435423
if value is None:
436424
await byte_setter.delete()

0 commit comments

Comments
 (0)