@@ -6,8 +6,73 @@ Zarr-Python 3 was designed to be extensible. This means that you can extend
66the library by writing custom classes and plugins. Currently, Zarr can be extended
77in the following ways:
88
9- 1. Writing custom stores
10- 2. Writing custom codecs
9+ Writing custom stores
10+ ---------------------
11+
12+
13+ Writing custom codecs
14+ ---------------------
15+
16+ There are three types of codecs in Zarr: array-to-array, array-to-bytes, and bytes-to-bytes.
17+ Array-to-array codecs are used to transform the n-dimensional array data before serializing
18+ to bytes. Examples include delta encoding or scaling codecs. Array-to-bytes codecs are used
19+ for serializing the array data to bytes. In Zarr, the main codec to use for numeric arrays
20+ is the :class: `zarr.codecs.BytesCodec `. Bytes-to-bytes transform the serialized bytestreams
21+ of the array data. Examples include compression codecs, such as
22+ :class: `zarr.codecs.GzipCodec `, :class: `zarr.codecs.BloscCodec ` or
23+ :class: `zarr.codecs.ZstdCodec `, and codecs that add a checksum to the bytestream, such as
24+ :class: `zarr.codecs.Crc32cCodec `.
25+
26+ Custom codecs for Zarr are implemented as classes that inherit from the relevant base class,
27+ see :class: `zarr.abc.codecs.ArrayArrayCodec `, :class: `zarr.abc.codecs.ArrayBytesCodec ` and
28+ :class: `zarr.abc.codecs.BytesBytesCodec `. Most custom codecs should implemented the
29+ ``_encode_single `` and ``_decode_single `` methods. These methods operate on single chunks
30+ of the array data. Custom codecs can also implement the ``encode `` and ``decode `` methods,
31+ which operate on batches of chunks, in case the codec is intended to implement its own
32+ batch processing.
33+
34+ Custom codecs should also implement these methods:
35+ - ``compute_encoded_size ``, which returns the byte size of the encoded data given the byte
36+ size of the original data. It should raise ``NotImplementedError `` for codecs with
37+ variable-sized outputs, such as compression codecs.
38+ - ``validate ``, which can be used to check that the codec metadata is compatible with the
39+ array metadata. It should raise errors if not.
40+ - ``resolve_metadata `` (optional), which is important for codecs that change the shape,
41+ dtype or fill value of a chunk.
42+ - ``evolve_from_array_spec `` (optional), which can be useful for automatically filling in
43+ codec configuration metadata from the array metadata.
44+
45+ To use custom codecs in Zarr, they need to be registered using the
46+ `entrypoint mechanism <https://packaging.python.org/en/latest/specifications/entry-points/>_ `.
47+ Commonly, entrypoints are declared in the ``pyproject.toml `` of your package under the
48+ ``[project.entry-points] `` section. Zarr will automatically discover and load all codecs
49+ registered with the entrypoint mechanism from imported modules.
50+
51+ [project.entry-points."zarr.codecs"]
52+ "custompackage.fancy_codec" = "custompackage:FancyCodec"
53+
54+ New codecs need to have their own unique identifier. To avoid naming collisions, it is
55+ strongly recommended to prefix the codec identifier with a unique name. For example,
56+ the codecs from ``numcodecs `` are prefixed with ``numcodecs. ``, e.g. ``numcodecs.delta ``.
57+
58+ .. note ::
59+ Note that the extension mechanism for the Zarr specification version 3 is still
60+ under development. Requirements for custom codecs including the choice of codec
61+ identifiers might change in the future.
62+
63+ It is also possible to register codecs as replacements for existing codecs. This might be
64+ useful for providing specialized implementations, such as GPU-based codecs. In case of
65+ multiple codecs, the :mod: `zarr.core.config ` mechanism can be used to select the preferred
66+ implementation.
67+
68+ TODO: Link to documentation of :mod: `zarr.core.config `
69+
70+ .. note ::
71+ This sections explains how custom codecs can be created for Zarr version 3. For Zarr
72+ version 2, codecs should implement the
73+ ```numcodecs.abc.Codec `` <https://numcodecs.readthedocs.io/en/stable/abc.html>_`
74+ base class.
75+
1176
1277In the future, Zarr will support writing custom custom data types and chunk grids.
1378
0 commit comments