Skip to content

Commit 3c883a3

Browse files
K-Meechd-v-b
andauthored
Add CLI for converting v2 metadata to v3 (#3257)
* add rough cli converter structure * allow zstd, gzip and numcodecs zarr 3 compression * convert filters to v3 * create BytesCodec with correct endian * handle C vs F order in v2 metadata * save group and array metadata to file * create overall conversion functions for store, array or group * add minimal typer cli * add initial tests for converter * add tests for conversion of groups and nested groups and arrays * add tests for conversion of compressors and filters * test conversion of order and endianness * add tests for edge cases of incorrect codecs * add tests for / separator * draft of metadata remover and add test for internal paths * add clear command to cli with tests * add test for metadata removal with path# * add verbose logging option * add dry run option to cli * add test for dry-run * add zarr-converter script and enable cli dep in tests * use v2 chunk key encoding type * update endianness of test data type * check converted arrays can be accessed * remove uses of pathlib walk, as it didn't exist in python 3.11 * include tags in checkout for gpu test, to avoid numcodecs.zarr3 requesting a zarr version greater than 3 * rename cli commands from review comments * remove path option * allow metadata to be written to a separate store location * add overwrite and remove-v2-metadata options * add force option * use v2, v3 format for CLI * split into convert_group and convert_array functions * update command names in converter tests * update test filename to reflect command name change * fix tests for sub-groups * add tests for --force * add test for migrating to separate output location * add test for remove-v2-metadata option * update test names to match command name * add test for --remove-v2-metadata with separate output location * separate cli fixtures from the tests * add test for overwrite option in separate location * fix failing test * small fixes to tests * fix pre-commit errors * update docstrings with review comments * pass filters and compressors to processing functions, rather than full metadata * use Store as input rather than StoreLike * move conversion functions into public api * fail on discovery of consolidated metadata * minor changes from review * use same logger throughout zarr-python * add release notes and docs for the cli * tidy up formatting of zarr.metadata api docs * fix failing tests * add a section about --verbose to the docs * update docstrings to reference zarr.codecs.numcodecs --------- Co-authored-by: Davis Bennett <[email protected]>
1 parent 9eb2d28 commit 3c883a3

File tree

16 files changed

+1599
-90
lines changed

16 files changed

+1599
-90
lines changed

.github/workflows/gpu_test.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,8 @@ jobs:
3030

3131
steps:
3232
- uses: actions/checkout@v5
33+
with:
34+
fetch-depth: 0 # grab all branches and tags
3335
# - name: cuda-toolkit
3436
# uses: Jimver/[email protected]
3537
# id: cuda-toolkit

changes/1798.feature.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Add a command-line interface to migrate v2 Zarr metadata to v3. Corresponding functions are also
2+
provided under zarr.metadata.

docs/user-guide/cli.rst

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
.. _user-guide-cli:
2+
3+
Command-line interface
4+
========================
5+
6+
Zarr-Python provides a command-line interface that enables:
7+
8+
- migration of Zarr v2 metadata to v3
9+
- removal of v2 or v3 metadata
10+
11+
To see available commands run the following in a terminal:
12+
13+
.. code-block:: bash
14+
15+
$ zarr --help
16+
17+
or to get help on individual commands:
18+
19+
.. code-block:: bash
20+
21+
$ zarr migrate --help
22+
23+
$ zarr remove-metadata --help
24+
25+
26+
Migrate metadata from v2 to v3
27+
------------------------------
28+
29+
Migrate to a separate location
30+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31+
32+
To migrate a Zarr array/group's metadata from v2 to v3 run:
33+
34+
.. code-block:: bash
35+
36+
$ zarr migrate v3 path/to/input.zarr path/to/output.zarr
37+
38+
This will write new ``zarr.json`` files to ``output.zarr``, leaving ``input.zarr`` un-touched.
39+
Note - this will migrate the entire Zarr hierarchy, so if ``input.zarr`` contains multiple groups/arrays,
40+
new ``zarr.json`` will be made for all of them.
41+
42+
Migrate in-place
43+
~~~~~~~~~~~~~~~~
44+
45+
If you'd prefer to migrate the metadata in-place run:
46+
47+
.. code-block:: bash
48+
49+
$ zarr migrate v3 path/to/input.zarr
50+
51+
This will write new ``zarr.json`` files to ``input.zarr``, leaving the existing v2 metadata un-touched.
52+
53+
To open the array/group using the new metadata use:
54+
55+
.. code-block:: python
56+
57+
>>> import zarr
58+
>>> zarr_with_v3_metadata = zarr.open('path/to/input.zarr', zarr_format=3)
59+
60+
Once you are happy with the conversion, you can run the following to remove the old v2 metadata:
61+
62+
.. code-block:: bash
63+
64+
$ zarr remove-metadata v2 path/to/input.zarr
65+
66+
Note there is also a shortcut to migrate and remove v2 metadata in one step:
67+
68+
.. code-block:: bash
69+
70+
$ zarr migrate v3 path/to/input.zarr --remove-v2-metadata
71+
72+
73+
Remove metadata
74+
----------------
75+
76+
Remove v2 metadata using:
77+
78+
.. code-block:: bash
79+
80+
$ zarr remove-metadata v2 path/to/input.zarr
81+
82+
or v3 with:
83+
84+
.. code-block:: bash
85+
86+
$ zarr remove-metadata v3 path/to/input.zarr
87+
88+
By default, this will only allow removal of metadata if a valid alternative exists. For example, you can't
89+
remove v2 metadata unless v3 metadata exists at that location.
90+
91+
To override this behaviour use ``--force``:
92+
93+
.. code-block:: bash
94+
95+
$ zarr remove-metadata v3 path/to/input.zarr --force
96+
97+
98+
Dry run
99+
--------
100+
All commands provide a ``--dry-run`` option that will log changes that would be made on a real run, without creating
101+
or modifying any files.
102+
103+
.. code-block:: bash
104+
105+
$ zarr migrate v3 path/to/input.zarr --dry-run
106+
107+
Dry run enabled - no new files will be created or changed. Log of files that would be created on a real run:
108+
Saving metadata to path/to/input.zarr/zarr.json
109+
110+
111+
Verbose
112+
--------
113+
You can also add ``--verbose`` **before** any command, to see a full log of its actions:
114+
115+
.. code-block:: bash
116+
117+
$ zarr --verbose migrate v3 path/to/input.zarr
118+
119+
$ zarr --verbose remove-metadata v2 path/to/input.zarr
120+
121+
122+
Equivalent functions
123+
--------------------
124+
All features of the command-line interface are also available via functions under
125+
:mod:`zarr.metadata`.
126+
127+

docs/user-guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ User guide
1313
storage
1414
config
1515
v3_migration
16+
cli
1617

1718
Advanced Topics
1819
---------------

pyproject.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,7 @@ remote = [
6868
gpu = [
6969
"cupy-cuda12x",
7070
]
71+
cli = ["typer"]
7172
# Development extras
7273
test = [
7374
"coverage>=7.10",
@@ -114,6 +115,9 @@ docs = [
114115
'pytest'
115116
]
116117

118+
[project.scripts]
119+
zarr = "zarr._cli.cli:app"
120+
117121

118122
[project.urls]
119123
issues = "https://github.com/zarr-developers/zarr-python/issues"
@@ -164,7 +168,7 @@ deps = ["minimal", "optional"]
164168

165169
[tool.hatch.envs.test.overrides]
166170
matrix.deps.dependencies = [
167-
{value = "zarr[remote, remote_tests, test, optional]", if = ["optional"]}
171+
{value = "zarr[remote, remote_tests, test, optional, cli]", if = ["optional"]}
168172
]
169173

170174
[tool.hatch.envs.test.scripts]

src/zarr/__init__.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,7 @@
1+
import functools
2+
import logging
3+
from typing import Literal
4+
15
from zarr._version import version as __version__
26
from zarr.api.synchronous import (
37
array,
@@ -37,6 +41,8 @@
3741
# in case setuptools scm screw up and find version to be 0.0.0
3842
assert not __version__.startswith("0.0.0")
3943

44+
_logger = logging.getLogger(__name__)
45+
4046

4147
def print_debug_info() -> None:
4248
"""
@@ -85,6 +91,58 @@ def print_packages(packages: list[str]) -> None:
8591
print_packages(optional)
8692

8793

94+
# The decorator ensures this always returns the same handler (and it is only
95+
# attached once).
96+
@functools.cache
97+
def _ensure_handler() -> logging.Handler:
98+
"""
99+
The first time this function is called, attach a `StreamHandler` using the
100+
same format as `logging.basicConfig` to the Zarr-Python root logger.
101+
102+
Return this handler every time this function is called.
103+
"""
104+
handler = logging.StreamHandler()
105+
handler.setFormatter(logging.Formatter(logging.BASIC_FORMAT))
106+
_logger.addHandler(handler)
107+
return handler
108+
109+
110+
def set_log_level(
111+
level: Literal["NOTSET", "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
112+
) -> None:
113+
"""Set the logging level for Zarr-Python.
114+
115+
Zarr-Python uses the standard library `logging` framework under the root
116+
logger 'zarr'. This is a helper function to:
117+
118+
- set Zarr-Python's root logger level
119+
- set the root logger handler's level, creating the handler
120+
if it does not exist yet
121+
122+
Parameters
123+
----------
124+
level : str
125+
The logging level to set.
126+
"""
127+
_logger.setLevel(level)
128+
_ensure_handler().setLevel(level)
129+
130+
131+
def set_format(log_format: str) -> None:
132+
"""Set the format of logging messages from Zarr-Python.
133+
134+
Zarr-Python uses the standard library `logging` framework under the root
135+
logger 'zarr'. This sets the format of log messages from the root logger's StreamHandler.
136+
137+
Parameters
138+
----------
139+
log_format : str
140+
A string determining the log format (as defined in the standard library's `logging` module
141+
for logging.Formatter)
142+
"""
143+
_ensure_handler().setFormatter(logging.Formatter(fmt=log_format))
144+
145+
88146
__all__ = [
89147
"Array",
90148
"AsyncArray",

src/zarr/_cli/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)