Skip to content

Commit dcb2e39

Browse files
committed
Merge branch 'doc/3.0-updates' of github.com:jhamman/zarr-python into doc/3.0-updates
2 parents 61b4477 + a829fbb commit dcb2e39

File tree

19 files changed

+540
-111
lines changed

19 files changed

+540
-111
lines changed

docs/user-guide/config.rst

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
Runtime configuration
2+
=====================
3+
4+
The :mod:`zarr.core.config` module is responsible for managing the configuration of zarr
5+
and is based on the `donfig <https://github.com/pytroll/donfig>`_ Python library.
6+
7+
Configuration values can be set using code like the following:
8+
9+
.. code-block:: python
10+
11+
import zarr
12+
zarr.config.set({"array.order": "F"})
13+
14+
Alternatively, configuration values can be set using environment variables, e.g.
15+
``ZARR_ARRAY__ORDER=F``.
16+
17+
The configuration can also be read from a YAML file in standard locations.
18+
For more information, see the
19+
`donfig documentation <https://donfig.readthedocs.io/en/latest/>`_.
20+
21+
Configuration options include the following:
22+
23+
- Default Zarr format ``default_zarr_version``
24+
- Default array order in memory ``array.order``
25+
- Default codecs ``array.v3_default_codecs`` and ``array.v2_default_compressor``
26+
- Whether empty chunks are written to storage ``array.write_empty_chunks``
27+
- Async and threading options, e.g. ``async.concurrency`` and ``threading.max_workers``
28+
- Selections of implementations of codecs, codec pipelines and buffers
29+
30+
For selecting custom implementations of codecs, pipelines, buffers and ndbuffers,
31+
first register the implementations in the registry and then select them in the config.
32+
For example, an implementation of the bytes codec in a class "custompackage.NewBytesCodec",
33+
requires the value of ``codecs.bytes.name`` to be "custompackage.NewBytesCodec".
34+
35+
This is the current default configuration:
36+
37+
.. ipython:: python
38+
39+
import zarr
40+
41+
zarr.config.pprint()

docs/user-guide/extending.rst

Lines changed: 72 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,78 @@ Zarr-Python 3 was designed to be extensible. This means that you can extend
66
the library by writing custom classes and plugins. Currently, Zarr can be extended
77
in the following ways:
88

9-
1. Writing custom stores
10-
2. Writing custom codecs
9+
Custom stores
10+
-------------
11+
12+
13+
Custom codecs
14+
-------------
15+
16+
There are three types of codecs in Zarr: array-to-array, array-to-bytes, and bytes-to-bytes.
17+
Array-to-array codecs are used to transform the n-dimensional array data before serializing
18+
to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used
19+
for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays
20+
is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes transform the serialized bytestreams
21+
of the array data. Examples include compression codecs, such as
22+
:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or
23+
:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as
24+
:class:`zarr.codecs.Crc32cCodec`.
25+
26+
Custom codecs for Zarr are implemented by subclassing the relevant base class, see
27+
:class:`zarr.abc.codec.ArrayArrayCodec`, :class:`zarr.abc.codec.ArrayBytesCodec` and
28+
:class:`zarr.abc.codec.BytesBytesCodec`. Most custom codecs should implemented the
29+
``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks
30+
of the array data. Alternatively, custom codecs can implement the ``encode`` and ``decode``
31+
methods, which operate on batches of chunks, in case the codec is intended to implement
32+
its own batch processing.
33+
34+
Custom codecs should also implement the following methods:
35+
36+
- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte
37+
size of the original data. It should raise ``NotImplementedError`` for codecs with
38+
variable-sized outputs, such as compression codecs.
39+
- ``validate``, which can be used to check that the codec metadata is compatible with the
40+
array metadata. It should raise errors if not.
41+
- ``resolve_metadata`` (optional), which is important for codecs that change the shape,
42+
dtype or fill value of a chunk.
43+
- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in
44+
codec configuration metadata from the array metadata.
45+
46+
To use custom codecs in Zarr, they need to be registered using the
47+
`entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>`_.
48+
Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the
49+
``[project.entry-points."zarr.codecs"]`` section. Zarr will automatically discover and
50+
load all codecs registered with the entrypoint mechanism from imported modules.
51+
52+
.. code-block:: toml
53+
54+
[project.entry-points."zarr.codecs"]
55+
"custompackage.fancy_codec" = "custompackage:FancyCodec"
56+
57+
New codecs need to have their own unique identifier. To avoid naming collisions, it is
58+
strongly recommended to prefix the codec identifier with a unique name. For example,
59+
the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``.
60+
61+
.. note::
62+
Note that the extension mechanism for the Zarr version 3 is still under development.
63+
Requirements for custom codecs including the choice of codec identifiers might
64+
change in the future.
65+
66+
It is also possible to register codecs as replacements for existing codecs. This might be
67+
useful for providing specialized implementations, such as GPU-based codecs. In case of
68+
multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred
69+
implementation.
70+
71+
.. note::
72+
This sections explains how custom codecs can be created for Zarr version 3. For Zarr
73+
version 2, codecs should subclass the
74+
`numcodecs.abc.Codec <https://numcodecs.readthedocs.io/en/stable/abc.html#numcodecs.abc.Codec>`_
75+
base class and register through
76+
`numcodecs.registry.register_codec <https://numcodecs.readthedocs.io/en/stable/registry.html#numcodecs.registry.register_codec>`_.
77+
78+
79+
Other extensions
80+
----------------
1181

1282
In the future, Zarr will support writing custom custom data types and chunk grids.
1383

docs/user-guide/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ User Guide
1010
arrays
1111
groups
1212
storage
13+
config
1314
v3_migration
1415
todo
1516

docs/user-guide/v3_migration.rst

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,9 @@ The Array class
9393
1. Disallow direct construction - use :func:`zarr.open_array` or :func:`zarr.create_array`
9494
instead of directly constructing the :class:`zarr.Array` class.
9595

96+
2. Defaulting to ``zarr_format=3`` - newly created arrays will use the version 3 of the
97+
Zarr specification. To continue using version 2, set ``zarr_format=2`` when creating arrays.
98+
9699
The Group class
97100
~~~~~~~~~~~~~~~
98101

@@ -131,6 +134,30 @@ Dependencies Changes
131134
- The ``jupyter`` optional dependency group has been removed, since v3 contains no
132135
jupyter specific functionality.
133136

137+
Configuration
138+
~~~~~~~~~~~~~
139+
140+
There is a new configuration system based on `donfig <https://github.com/pytroll/donfig>`_,
141+
which can be accessed via :mod:`zarr.core.config`.
142+
Configuration values can be set using code like the following:
143+
144+
.. code-block:: python
145+
146+
import zarr
147+
zarr.config.set({"array.order": "F"})
148+
149+
Alternatively, configuration values can be set using environment variables,
150+
e.g. ``ZARR_ARRAY__ORDER=F``.
151+
152+
Configuration options include the following:
153+
154+
- Default Zarr format ``default_zarr_version``
155+
- Default array order in memory ``array.order``
156+
- Default codecs ``array.v3_default_codecs`` and ``array.v2_default_compressor``
157+
- Whether empty chunks are written to storage ``array.write_empty_chunks``
158+
- Async and threading options, e.g. ``async.concurrency`` and ``threading.max_workers``
159+
- Selections of implementations of codecs, codec pipelines and buffers
160+
134161
Miscellaneous
135162
~~~~~~~~~~~~~
136163

src/zarr/api/asynchronous.py

Lines changed: 55 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,16 @@
1010
from typing_extensions import deprecated
1111

1212
from zarr.core.array import Array, AsyncArray, get_array_metadata
13+
from zarr.core.array_spec import ArrayConfig, ArrayConfigParams
1314
from zarr.core.buffer import NDArrayLike
1415
from zarr.core.common import (
1516
JSON,
1617
AccessModeLiteral,
1718
ChunkCoords,
1819
MemoryOrder,
1920
ZarrFormat,
21+
_warn_order_kwarg,
22+
_warn_write_empty_chunks_kwarg,
2023
parse_dtype,
2124
)
2225
from zarr.core.config import config
@@ -794,7 +797,7 @@ async def create(
794797
read_only: bool | None = None,
795798
object_codec: Codec | None = None, # TODO: type has changed
796799
dimension_separator: Literal[".", "/"] | None = None,
797-
write_empty_chunks: bool = False, # TODO: default has changed
800+
write_empty_chunks: bool | None = None,
798801
zarr_version: ZarrFormat | None = None, # deprecated
799802
zarr_format: ZarrFormat | None = None,
800803
meta_array: Any | None = None, # TODO: need type
@@ -810,6 +813,7 @@ async def create(
810813
codecs: Iterable[Codec | dict[str, JSON]] | None = None,
811814
dimension_names: Iterable[str] | None = None,
812815
storage_options: dict[str, Any] | None = None,
816+
config: ArrayConfig | ArrayConfigParams | None = None,
813817
**kwargs: Any,
814818
) -> AsyncArray[ArrayV2Metadata] | AsyncArray[ArrayV3Metadata]:
815819
"""Create an array.
@@ -856,8 +860,10 @@ async def create(
856860
These defaults can be changed by modifying the value of ``array.v2_default_compressor`` in :mod:`zarr.core.config`. fill_value : object
857861
Default value to use for uninitialized portions of the array.
858862
order : {'C', 'F'}, optional
863+
Deprecated in favor of the ``config`` keyword argument.
864+
Pass ``{'order': <value>}`` to ``create`` instead of using this parameter.
859865
Memory layout to be used within each chunk.
860-
If not specified, default is taken from the Zarr config ```array.order```.
866+
If not specified, the ``array.order`` parameter in the global config will be used.
861867
store : Store or str
862868
Store or path to directory in file system or name of zip file.
863869
synchronizer : object, optional
@@ -891,30 +897,26 @@ async def create(
891897
Separator placed between the dimensions of a chunk.
892898
V2 only. V3 arrays should use ``chunk_key_encoding`` instead.
893899
Default is ".".
894-
.. versionadded:: 2.8
895-
896900
write_empty_chunks : bool, optional
897-
If True (default), all chunks will be stored regardless of their
901+
Deprecated in favor of the ``config`` keyword argument.
902+
Pass ``{'write_empty_chunks': <value>}`` to ``create`` instead of using this parameter.
903+
If True, all chunks will be stored regardless of their
898904
contents. If False, each chunk is compared to the array's fill value
899905
prior to storing. If a chunk is uniformly equal to the fill value, then
900906
that chunk is not be stored, and the store entry for that chunk's key
901-
is deleted. This setting enables sparser storage, as only chunks with
902-
non-fill-value data are stored, at the expense of overhead associated
903-
with checking the data of each chunk.
904-
905-
.. versionadded:: 2.11
906-
907+
is deleted.
907908
zarr_format : {2, 3, None}, optional
908909
The zarr format to use when saving.
909910
Default is 3.
910911
meta_array : array-like, optional
911912
An array instance to use for determining arrays to create and return
912913
to users. Use `numpy.empty(())` by default.
913-
914-
.. versionadded:: 2.13
915914
storage_options : dict
916915
If using an fsspec URL to create the store, these will be passed to
917916
the backend implementation. Ignored otherwise.
917+
config : ArrayConfig or ArrayConfigParams, optional
918+
Runtime configuration of the array. If provided, will override the
919+
default values from `zarr.config.array`.
918920
919921
Returns
920922
-------
@@ -951,26 +953,47 @@ async def create(
951953
warnings.warn("object_codec is not yet implemented", RuntimeWarning, stacklevel=2)
952954
if read_only is not None:
953955
warnings.warn("read_only is not yet implemented", RuntimeWarning, stacklevel=2)
954-
if dimension_separator is not None:
955-
if zarr_format == 3:
956-
raise ValueError(
957-
"dimension_separator is not supported for zarr format 3, use chunk_key_encoding instead"
958-
)
959-
else:
960-
warnings.warn(
961-
"dimension_separator is not yet implemented",
962-
RuntimeWarning,
963-
stacklevel=2,
964-
)
965-
if write_empty_chunks:
966-
warnings.warn("write_empty_chunks is not yet implemented", RuntimeWarning, stacklevel=2)
956+
if dimension_separator is not None and zarr_format == 3:
957+
raise ValueError(
958+
"dimension_separator is not supported for zarr format 3, use chunk_key_encoding instead"
959+
)
960+
961+
if order is not None:
962+
_warn_order_kwarg()
963+
if write_empty_chunks is not None:
964+
_warn_write_empty_chunks_kwarg()
965+
967966
if meta_array is not None:
968967
warnings.warn("meta_array is not yet implemented", RuntimeWarning, stacklevel=2)
969968

970969
mode = kwargs.pop("mode", None)
971970
if mode is None:
972971
mode = "a"
973972
store_path = await make_store_path(store, path=path, mode=mode, storage_options=storage_options)
973+
974+
config_dict: ArrayConfigParams = {}
975+
976+
if write_empty_chunks is not None:
977+
if config is not None:
978+
msg = (
979+
"Both write_empty_chunks and config keyword arguments are set. "
980+
"This is redundant. When both are set, write_empty_chunks will be ignored and "
981+
"config will be used."
982+
)
983+
warnings.warn(UserWarning(msg), stacklevel=1)
984+
config_dict["write_empty_chunks"] = write_empty_chunks
985+
if order is not None:
986+
if config is not None:
987+
msg = (
988+
"Both order and config keyword arguments are set. "
989+
"This is redundant. When both are set, order will be ignored and "
990+
"config will be used."
991+
)
992+
warnings.warn(UserWarning(msg), stacklevel=1)
993+
config_dict["order"] = order
994+
995+
config_parsed = ArrayConfig.from_dict(config_dict)
996+
974997
return await AsyncArray.create(
975998
store_path,
976999
shape=shape,
@@ -987,7 +1010,7 @@ async def create(
9871010
codecs=codecs,
9881011
dimension_names=dimension_names,
9891012
attributes=attributes,
990-
order=order,
1013+
config=config_parsed,
9911014
**kwargs,
9921015
)
9931016

@@ -1163,6 +1186,11 @@ async def open_array(
11631186

11641187
zarr_format = _handle_zarr_version_or_format(zarr_version=zarr_version, zarr_format=zarr_format)
11651188

1189+
if "order" in kwargs:
1190+
_warn_order_kwarg()
1191+
if "write_empty_chunks" in kwargs:
1192+
_warn_write_empty_chunks_kwarg()
1193+
11661194
try:
11671195
return await AsyncArray.open(store_path, zarr_format=zarr_format)
11681196
except FileNotFoundError:

0 commit comments

Comments
 (0)