Skip to content

Commit e5e5fb3

Browse files
committed
add docs for extending codecs
1 parent 7cfb8f8 commit e5e5fb3

File tree

1 file changed

+67
-2
lines changed

1 file changed

+67
-2
lines changed

docs/user-guide/extending.rst

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,73 @@ Zarr-Python 3 was designed to be extensible. This means that you can extend
66
the library by writing custom classes and plugins. Currently, Zarr can be extended
77
in the following ways:
88

9-
1. Writing custom stores
10-
2. Writing custom codecs
9+
Writing custom stores
10+
---------------------
11+
12+
13+
Writing custom codecs
14+
---------------------
15+
16+
There are three types of codecs in Zarr: array-to-array, array-to-bytes, and bytes-to-bytes.
17+
Array-to-array codecs are used to transform the n-dimensional array data before serializing
18+
to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used
19+
for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays
20+
is the :class:`zarr.codecs.BytesCodec`. Bytes-to-bytes transform the serialized bytestreams
21+
of the array data. Examples include compression codecs, such as
22+
:class:`zarr.codecs.GzipCodec`, :class:`zarr.codecs.BloscCodec` or
23+
:class:`zarr.codecs.ZstdCodec`, and codecs that add a checksum to the bytestream, such as
24+
:class:`zarr.codecs.Crc32cCodec`.
25+
26+
Custom codecs for Zarr are implemented as classes that inherit from the relevant base class,
27+
see :class:`zarr.abc.codecs.ArrayArrayCodec`, :class:`zarr.abc.codecs.ArrayBytesCodec` and
28+
:class:`zarr.abc.codecs.BytesBytesCodec`. Most custom codecs should implemented the
29+
``_encode_single`` and ``_decode_single`` methods. These methods operate on single chunks
30+
of the array data. Custom codecs can also implement the ``encode`` and ``decode`` methods,
31+
which operate on batches of chunks, in case the codec is intended to implement its own
32+
batch processing.
33+
34+
Custom codecs should also implement these methods:
35+
- ``compute_encoded_size``, which returns the byte size of the encoded data given the byte
36+
size of the original data. It should raise ``NotImplementedError`` for codecs with
37+
variable-sized outputs, such as compression codecs.
38+
- ``validate``, which can be used to check that the codec metadata is compatible with the
39+
array metadata. It should raise errors if not.
40+
- ``resolve_metadata`` (optional), which is important for codecs that change the shape,
41+
dtype or fill value of a chunk.
42+
- ``evolve_from_array_spec`` (optional), which can be useful for automatically filling in
43+
codec configuration metadata from the array metadata.
44+
45+
To use custom codecs in Zarr, they need to be registered using the
46+
`entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>_`.
47+
Commonly, entrypoints are declared in the ``pyproject.toml`` of your package under the
48+
``[project.entry-points]`` section. Zarr will automatically discover and load all codecs
49+
registered with the entrypoint mechanism from imported modules.
50+
51+
[project.entry-points."zarr.codecs"]
52+
"custompackage.fancy_codec" = "custompackage:FancyCodec"
53+
54+
New codecs need to have their own unique identifier. To avoid naming collisions, it is
55+
strongly recommended to prefix the codec identifier with a unique name. For example,
56+
the codecs from ``numcodecs`` are prefixed with ``numcodecs.``, e.g. ``numcodecs.delta``.
57+
58+
.. note::
59+
Note that the extension mechanism for the Zarr specification version 3 is still
60+
under development. Requirements for custom codecs including the choice of codec
61+
identifiers might change in the future.
62+
63+
It is also possible to register codecs as replacements for existing codecs. This might be
64+
useful for providing specialized implementations, such as GPU-based codecs. In case of
65+
multiple codecs, the :mod:`zarr.core.config` mechanism can be used to select the preferred
66+
implementation.
67+
68+
TODO: Link to documentation of :mod:`zarr.core.config`
69+
70+
.. note::
71+
This sections explains how custom codecs can be created for Zarr version 3. For Zarr
72+
version 2, codecs should implement the
73+
```numcodecs.abc.Codec`` <https://numcodecs.readthedocs.io/en/stable/abc.html>_`
74+
base class.
75+
1176

1277
In the future, Zarr will support writing custom custom data types and chunk grids.
1378

0 commit comments

Comments
 (0)