Skip to content

Commit 7c56b87

Browse files
committed
docs
1 parent fdc1c8f commit 7c56b87

File tree

5 files changed

+108
-27
lines changed

5 files changed

+108
-27
lines changed

docs/quickstart.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,28 @@ Zarr allows you to create hierarchical groups, similar to directories::
119119

120120
This creates a group with two datasets: ``foo`` and ``bar``.
121121

122+
Batch Hierarchy Creation
123+
~~~~~~~~~~~~~~~~~~~~~~~~
124+
125+
Zarr provides tools for creating a collection of arrays and groups with a single function call.
126+
Suppose we want to copy existing groups and arrays into a new storage backend:
127+
128+
>>> # Create nested groups and add arrays
129+
>>> root = zarr.group("data/example-3.zarr", attributes={'name': 'root'})
130+
>>> foo = root.create_group(name="foo")
131+
>>> bar = root.create_array(
132+
... name="bar", shape=(100, 10), chunks=(10, 10), dtype="f4"
133+
... )
134+
>>> nodes = {'': root.metadata} | {k: v.metadata for k,v in root.members()}
135+
>>> print(nodes)
136+
>>> from zarr.storage import MemoryStore
137+
>>> new_nodes = dict(zarr.create_hierarchy(store=MemoryStore(), nodes=nodes))
138+
>>> new_root = new_nodes['']
139+
>>> assert new_root.attrs == root.attrs
140+
141+
Note that :func:`zarr.create_hierarchy` will only initialize arrays and groups -- copying array data must
142+
be done in a separate step.
143+
122144
Persistent Storage
123145
------------------
124146

docs/user-guide/groups.rst

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,31 @@ For more information on groups see the :class:`zarr.Group` API docs.
7575

7676
.. _user-guide-diagnostics:
7777

78+
Batch Group Creation
79+
--------------------
80+
81+
You can also create multiple groups concurrently with a single function call. :func:`zarr.create_hierarchy` takes
82+
a :class:`zarr.storage.Store` instance and a dict of ``key : metadata`` pairs, parses that dict, and
83+
writes metadata documents to storage:
84+
85+
>>> from zarr import create_hierarchy
86+
>>> from zarr.core.group import GroupMetadata
87+
>>> from zarr.storage import LocalStore
88+
>>> node_spec = {'a/b/c': GroupMetadata()}
89+
>>> nodes_created = dict(create_hierarchy(store=LocalStore(root='data'), nodes=node_spec))
90+
>>> print(sorted(nodes_created.items(), key=lambda kv: len(kv[0])))
91+
[('', <Group file://data>), ('a', <Group file://data/a>), ('a/b', <Group file://data/a/b>), ('a/b/c', <Group file://data/a/b/c>)]
92+
93+
Note that we only specified a single group named ``a/b/c``, but 4 groups were created. These additional groups
94+
were created to ensure that the desired node ``a/b/c`` is connected to the root group ``''`` by a sequence
95+
of intermediate groups. :func:`zarr.create_hierarchy` normalizes the ``nodes`` keyword argument to
96+
ensure that the resulting hierarchy is complete, i.e. all groups or arrays are connected to the root
97+
of the hierarchy via intermediate groups.
98+
99+
Because :func:`zarr.create_hierarchy` concurrently creates metadata documents, it's more efficient
100+
than repeated calls to :func:`create_group` or :func:`create_array`, provided you can statically define
101+
the metadata for the groups and arrays you want to create.
102+
78103
Array and group diagnostics
79104
---------------------------
80105

src/zarr/__init__.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,6 @@
5252
"create_array",
5353
"create_group",
5454
"create_hierarchy",
55-
"create_nodes",
56-
"create_rooted_hierarchy",
5755
"empty",
5856
"empty_like",
5957
"full",

src/zarr/core/group.py

Lines changed: 27 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2866,15 +2866,10 @@ async def create_hierarchy(
28662866
"""
28672867
Create a complete zarr hierarchy from a collection of metadata objects.
28682868
2869-
This function will parse its input to ensure that the hierarchy is valid. In this context,
2870-
"valid" means that the following requirements are met:
2871-
* All nodes have the same zarr format.
2872-
* There are no nodes descending from arrays.
2873-
* There are no implicit groups. Any implicit groups will be inserted as needed. For example,
2874-
an input like ```{'a': GroupMetadata, 'a/b/c': GroupMetadata}``` defines an implicit group at
2875-
the path ```a/b```, and also at the root of the hierarchy, which we denote with the empty string.
2876-
After parsing, that group will be added and the input will be:
2877-
```{'': GroupMetadata, 'a': GroupMetadata, 'a/b': GroupMetadata, 'a/b/c': GroupMetadata}```
2869+
This function will parse its input to ensure that the hierarchy is complete. Any implicit groups
2870+
will be inserted as needed. For example, an input like
2871+
```{'a/b': GroupMetadata}``` will be parsed to
2872+
```{'': GroupMetadata, 'a': GroupMetadata, 'b': Groupmetadata}```
28782873
28792874
After input parsing, this function then creates all the nodes in the hierarchy concurrently.
28802875
@@ -2886,22 +2881,38 @@ async def create_hierarchy(
28862881
store : Store
28872882
The storage backend to use.
28882883
nodes : dict[str, GroupMetadata | ArrayV3Metadata | ArrayV2Metadata]
2889-
A dictionary defining the hierarchy. The keys are the paths of the nodes
2890-
in the hierarchy, and the values are the metadata of the nodes. The
2891-
metadata must be either an instance of GroupMetadata, ArrayV3Metadata
2892-
or ArrayV2Metadata.
2884+
A dictionary defining the hierarchy. The keys are the paths of the nodes in the hierarchy,
2885+
relative to the root of the ``Store``. The root of the store can be specified with the empty
2886+
string ``''``. The values are instances of ``GroupMetadata`` or ``ArrayMetadata``. Note that
2887+
all values must have the same ``zarr_format`` -- it is an error to mix zarr versions in the
2888+
same hierarchy.
28932889
overwrite : bool
28942890
Whether to overwrite existing nodes. Defaults to ``False``, in which case an error is
28952891
raised instead of overwriting an existing array or group.
28962892
2893+
This function will not erase an existing group unless that group is explicitly named in
2894+
``nodes``. If ``nodes`` defines implicit groups, e.g. ``{`'a/b/c': GroupMetadata}``, and a
2895+
group already exists at path ``a``, then this function will leave the group at ``a`` as-is.
2896+
28972897
Yields
28982898
------
2899-
AsyncGroup | AsyncArray
2900-
The created nodes in the order they are created.
2899+
tuple[str, AsyncGroup | AsyncArray]
2900+
This function yields (path, node) pairs, in the order the nodes were created.
29012901
29022902
Examples
29032903
--------
2904-
2904+
from zarr.api.asynchronous import create_hierarchy
2905+
from zarr.storage import MemoryStore
2906+
from zarr.core.group import GroupMetadata
2907+
import asyncio
2908+
store = MemoryStore()
2909+
nodes = {'a': GroupMetadata(attributes={'name': 'leaf'})}
2910+
2911+
async def run():
2912+
print(dict([x async for x in create_hierarchy(store=store, nodes=nodes)]))
2913+
2914+
asyncio.run(run())
2915+
# {'a': <AsyncGroup memory://140345143770112/a>, '': <AsyncGroup memory://140345143770112>}
29052916
"""
29062917
# normalize the keys to be valid paths
29072918
nodes_normed_keys = _normalize_path_keys(nodes)

src/zarr/core/sync_group.py

Lines changed: 34 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -57,25 +57,50 @@ def create_hierarchy(
5757
"""
5858
Create a complete zarr hierarchy from a collection of metadata objects.
5959
60-
Groups that are implicitly defined by the input will be created as needed.
60+
This function will parse its input to ensure that the hierarchy is complete. Any implicit groups
61+
will be inserted as needed. For example, an input like
62+
```{'a/b': GroupMetadata}``` will be parsed to
63+
```{'': GroupMetadata, 'a': GroupMetadata, 'b': Groupmetadata}```
6164
62-
This function takes a parsed hierarchy dictionary and creates all the nodes in the hierarchy
63-
concurrently. Arrays and Groups are yielded in the order they are created.
65+
After input parsing, this function then creates all the nodes in the hierarchy concurrently.
66+
67+
Arrays and Groups are yielded in the order they are created. This order is not stable and
68+
should not be relied on.
6469
6570
Parameters
6671
----------
6772
store : Store
6873
The storage backend to use.
6974
nodes : dict[str, GroupMetadata | ArrayV3Metadata | ArrayV2Metadata]
70-
A dictionary defining the hierarchy. The keys are the paths of the nodes
71-
in the hierarchy, and the values are the metadata of the nodes. The
72-
metadata must be either an instance of GroupMetadata, ArrayV3Metadata
73-
or ArrayV2Metadata.
75+
A dictionary defining the hierarchy. The keys are the paths of the nodes in the hierarchy,
76+
relative to the root of the ``Store``. The root of the store can be specified with the empty
77+
string ``''``. The values are instances of ``GroupMetadata`` or ``ArrayMetadata``. Note that
78+
all values must have the same ``zarr_format`` -- it is an error to mix zarr versions in the
79+
same hierarchy.
80+
overwrite : bool
81+
Whether to overwrite existing nodes. Defaults to ``False``, in which case an error is
82+
raised instead of overwriting an existing array or group.
83+
84+
This function will not erase an existing group unless that group is explicitly named in
85+
``nodes``. If ``nodes`` defines implicit groups, e.g. ``{`'a/b/c': GroupMetadata}``, and a
86+
group already exists at path ``a``, then this function will leave the group at ``a`` as-is.
7487
7588
Yields
7689
------
77-
Group | Array
78-
The created nodes in the order they are created.
90+
tuple[str, Group | Array]
91+
This function yields (path, node) pairs, in the order the nodes were created.
92+
93+
Examples
94+
--------
95+
from zarr import create_hierarchy
96+
from zarr.storage import MemoryStore
97+
from zarr.core.group import GroupMetadata
98+
99+
store = MemoryStore()
100+
nodes = {'a': GroupMetadata(attributes={'name': 'leaf'})}
101+
nodes_created = dict(create_hierarchy(store=store, nodes=nodes))
102+
print(nodes)
103+
# {'a': GroupMetadata(attributes={'name': 'leaf'}, zarr_format=3, consolidated_metadata=None, node_type='group')}
79104
"""
80105
coro = create_hierarchy_async(store=store, nodes=nodes, overwrite=overwrite)
81106

0 commit comments

Comments
 (0)