|
| 1 | +Consolidated Metadata |
| 2 | +===================== |
| 3 | + |
| 4 | +Zarr-Python implements the `Consolidated Metadata_` extension to the Zarr Spec. |
| 5 | +Consolidated metadata can reduce the time needed to load the metadata for an |
| 6 | +entire hierarchy, especially when the metadata is being served over a network. |
| 7 | +Consolidated metadata essentially stores all the metadata for a hierarchy in the |
| 8 | +metadata of the root Group. |
| 9 | + |
| 10 | +Usage |
| 11 | +----- |
| 12 | + |
| 13 | +If consolidated metadata is present in a Zarr Group's metadata then it is used |
| 14 | +by default. The initial read to open the group will need to communicate with |
| 15 | +the store (reading from a file for a :class:`zarr.store.LocalStore`, making a |
| 16 | +network request for a :class:`zarr.store.RemoteStore`). After that, any subsequent |
| 17 | +metadata reads get child Group or Array nodes will *not* require reads from the store. |
| 18 | + |
| 19 | +In Python, the consolidated metadata is available on the ``.consolidated_metadata`` |
| 20 | +attribute of the ``GroupMetadata`` object. |
| 21 | + |
| 22 | +.. code-block:: python |
| 23 | +
|
| 24 | + >>> import zarr |
| 25 | + >>> store = zarr.store.MemoryStore({}, mode="w") |
| 26 | + >>> group = zarr.open_group(store=store) |
| 27 | + >>> group.create_array(shape=(1,), name="a") |
| 28 | + >>> group.create_array(shape=(2, 2), name="b") |
| 29 | + >>> group.create_array(shape=(3, 3, 3), name="c") |
| 30 | + >>> zarr.consolidate_metadata(store) |
| 31 | +
|
| 32 | +If we open that group, the Group's metadata has a :class:`zarr.ConsolidatedMetadata` |
| 33 | +that can be used. |
| 34 | + |
| 35 | +.. code-block:: python |
| 36 | +
|
| 37 | + >>> consolidated = zarr.open_group(store=store) |
| 38 | + >>> consolidated.metadata.consolidated_metadata.metadata |
| 39 | + {'b': ArrayV3Metadata(shape=(2, 2), fill_value=np.float64(0.0), ...), |
| 40 | + 'a': ArrayV3Metadata(shape=(1,), fill_value=np.float64(0.0), ...), |
| 41 | + 'c': ArrayV3Metadata(shape=(3, 3, 3), fill_value=np.float64(0.0), ...)} |
| 42 | +
|
| 43 | +Operations on the group to get children automatically use the consolidated metadata. |
| 44 | + |
| 45 | +.. code-block:: python |
| 46 | +
|
| 47 | + >>> consolidated["a"] # no read / HTTP request to the Store is required |
| 48 | + <Array memory://.../a shape=(1,) dtype=float64> |
| 49 | +
|
| 50 | +With nested groups, the consolidated metadata is available on the children, recursively. |
| 51 | + |
| 52 | +... code-block:: python |
| 53 | + |
| 54 | + >>> child = group.create_group("child", attributes={"kind": "child"}) |
| 55 | + >>> grandchild = child.create_group("child", attributes={"kind": "grandchild"}) |
| 56 | + >>> consolidated = zarr.consolidate_metadata(store) |
| 57 | + |
| 58 | + >>> consolidated["child"].metadata.consolidated_metadata |
| 59 | + ConsolidatedMetadata(metadata={'child': GroupMetadata(attributes={'kind': 'grandchild'}, zarr_format=3, )}, ...) |
| 60 | + |
| 61 | +Synchronization and Concurrency |
| 62 | +------------------------------- |
| 63 | + |
| 64 | +Consolidated metadata is intended for read-heavy use cases on slowly changing |
| 65 | +hierarchies. For hierarchies where new nodes are constantly being added, |
| 66 | +removed, or modified, consolidated metadata may not be desirable. |
| 67 | + |
| 68 | +1. It will add some overhead to each update operation, since the metadata |
| 69 | + would need to be re-consolidated to keep it in sync with the store. |
| 70 | +2. Readers using consolidated metadata will regularly see a "past" version |
| 71 | + of the metadata, at the time they read the root node with its consolidated |
| 72 | + metadata. |
| 73 | + |
| 74 | +.. _Consolidated Metadata: https://zarr-specs.readthedocs.io/en/latest/v3/core/v3.0.html#consolidated-metadata |
0 commit comments