Skip to content

Commit 2bf2069

Browse files
committed
docs: add docs on extending zarr 3
1 parent 01bc352 commit 2bf2069

File tree

2 files changed

+88
-1
lines changed

2 files changed

+88
-1
lines changed

docs/user-guide/extending.rst

Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
2+
Extending Zarr
3+
==============
4+
5+
Zarr-Python 3 was designed to be extensible. This means that you can extend
6+
the library by writing custom classes and plugins. Currently, Zarr can be extended
7+
in the following ways:
8+
9+
Custom codecs
10+
-------------
11+
12+
There are three types of codecs in Zarr: array-to-array, array-to-bytes, and bytes-to-bytes.
13+
Array-to-array codecs are used to transform the n-dimensional array data before serializing
14+
to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used
15+
for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays
16+
is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes transform the serialized bytestreams
17+
of the array data. Examples include compression codecs, such as
18+
:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or
19+
:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as
20+
:class:`zarr.codecs.Crc32cCodec`.
21+
22+
Custom codecs for Zarr are implemented by subclassing the relevant base class, see
23+
:class:`zarr.abc.codec.ArrayArrayCodec`, :class:`zarr.abc.codec.ArrayBytesCodec` and
24+
:class:`zarr.abc.codec.BytesBytesCodec`. Most custom codecs should implemented the
25+
``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks
26+
of the array data. Alternatively, custom codecs can implement the ``encode`` and ``decode``
27+
methods, which operate on batches of chunks, in case the codec is intended to implement
28+
its own batch processing.
29+
30+
Custom codecs should also implement the following methods:
31+
32+
- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte
33+
size of the original data. It should raise ``NotImplementedError`` for codecs with
34+
variable-sized outputs, such as compression codecs.
35+
- ``validate``, which can be used to check that the codec metadata is compatible with the
36+
array metadata. It should raise errors if not.
37+
- ``resolve_metadata`` (optional), which is important for codecs that change the shape,
38+
dtype or fill value of a chunk.
39+
- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in
40+
codec configuration metadata from the array metadata.
41+
42+
To use custom codecs in Zarr, they need to be registered using the
43+
`entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>`_.
44+
Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the
45+
``[project.entry-points."zarr.codecs"]`` section. Zarr will automatically discover and
46+
load all codecs registered with the entrypoint mechanism from imported modules.
47+
48+
.. code-block:: toml
49+
50+
[project.entry-points."zarr.codecs"]
51+
"custompackage.fancy_codec" = "custompackage:FancyCodec"
52+
53+
New codecs need to have their own unique identifier. To avoid naming collisions, it is
54+
strongly recommended to prefix the codec identifier with a unique name. For example,
55+
the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``.
56+
57+
.. note::
58+
Note that the extension mechanism for the Zarr version 3 is still under development.
59+
Requirements for custom codecs including the choice of codec identifiers might
60+
change in the future.
61+
62+
It is also possible to register codecs as replacements for existing codecs. This might be
63+
useful for providing specialized implementations, such as GPU-based codecs. In case of
64+
multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred
65+
implementation.
66+
67+
.. note::
68+
This sections explains how custom codecs can be created for Zarr version 3. For Zarr
69+
version 2, codecs should subclass the
70+
`numcodecs.abc.Codec <https://numcodecs.readthedocs.io/en/stable/abc.html#numcodecs.abc.Codec>`_
71+
base class and register through
72+
`numcodecs.registry.register_codec <https://numcodecs.readthedocs.io/en/stable/registry.html#numcodecs.registry.register_codec>`_.
73+
74+
Custom stores
75+
-------------
76+
77+
Coming soon.
78+
79+
Custom array buffers
80+
--------------------
81+
82+
Coming soon.
83+
84+
Other extensions
85+
----------------
86+
87+
In the future, Zarr will support writing custom custom data types and chunk grids.

docs/user-guide/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,10 @@ Advanced Topics
2424

2525
performance
2626
consolidated_metadata
27+
extending
2728
whatsnew_v3
2829
v3_todos
2930

3031

3132
.. Coming soon
3233
async
33-
extending

0 commit comments

Comments
 (0)