Skip to content

Commit fe07904

Browse files
committed
refactor codecs into a single registry spec
1 parent 6d6565d commit fe07904

File tree

3 files changed

+203
-129
lines changed

3 files changed

+203
-129
lines changed

docs/codecs.rst

Lines changed: 202 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,207 @@
1-
======
1+
==============
2+
Codec registry
3+
==============
4+
------------------------------
5+
Editor's Draft 21 October 2020
6+
------------------------------
7+
8+
Specification URI:
9+
https://purl.org/zarr/specs/codecs
10+
Issue tracking:
11+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
12+
Suggest an edit for this spec:
13+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/master/docs/codecs.rst>`_
14+
15+
Copyright 2020 `Zarr core development team
16+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
17+
is licensed under a `Creative Commons Attribution 3.0 Unported License
18+
<https://creativecommons.org/licenses/by/3.0/>`_.
19+
20+
----
21+
22+
23+
Abstract
24+
========
25+
26+
This document defines codecs for use as compressors and/or filters as
27+
part of a Zarr implementation.
28+
29+
30+
Status of this documents
31+
========================
32+
33+
This document is a **Work in Progress**. It may be updated, replaced
34+
or obsoleted by other documents at any time. It is inappapropriate to
35+
cite this document as other than work in progress.
36+
37+
Comments, questions or contributions to this document are very
38+
welcome. Comments and questions should be raised via `GitHub issues
39+
<https://github.com/zarr-developers/zarr-specs/labels/codec>`_.
40+
41+
This document is maintained by the `Zarr core development team
42+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
43+
44+
45+
Document conventions
46+
====================
47+
48+
This document lists a collection of codecs. For each codec, the
49+
following information is provided:
50+
51+
* A URI which can be used to uniquely identify the codec in Zarr array
52+
metadata.
53+
* Any configuration parameters which can be set in Zarr array
54+
metadata.
55+
* A definition of encoding/decoding algorithm and the encoded format,
56+
or a citation to an existing specification where this is defined.
57+
* Any additional headers added to the encoded data.
58+
59+
Conformance requirements are expressed with a combination of
60+
descriptive assertions and [RFC2119]_ terminology. The key words
61+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
62+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
63+
parts of this document are to be interpreted as described in
64+
[RFC2119]_. However, for readability, these words do not appear in all
65+
uppercase letters in this specification.
66+
67+
All of the text of this specification is normative except sections
68+
explicitly marked as non-normative, examples, and notes. Examples in
69+
this specification are introduced with the words "for example".
70+
71+
272
Codecs
373
======
474

5-
Under construction.
75+
Gzip
76+
----
77+
78+
Codec URI:
79+
https://purl.org/zarr/spec/codecs/gzip
80+
81+
82+
Configuration parameters
83+
~~~~~~~~~~~~~~~~~~~~~~~~
84+
85+
level:
86+
An integer from 0 to 9 which controls the speed and level of
87+
compression. A level of 1 is the fastest compression method and
88+
produces the least compressions, while 9 is slowest and produces
89+
the most compression. Compression is turned off completely when
90+
level is 0.
91+
92+
For example, the array metadata below specifies that the compressor is
93+
the Gzip codec configured with a compression level of 1::
94+
95+
{
96+
"compressor": {
97+
"codec": "https://purl.org/zarr/spec/codecs/gzip",
98+
"configuration": {
99+
"level": 1
100+
}
101+
},
102+
}
103+
104+
105+
Format and algorithm
106+
~~~~~~~~~~~~~~~~~~~~
107+
108+
Encoding and decoding is performed using the algorithm defined in
109+
[RFC1951]_.
110+
111+
Encoded data should conform to the Gzip file format [RFC1952]_.
112+
113+
114+
Blosc
115+
-----
116+
117+
Codec URI:
118+
https://purl.org/zarr/spec/codecs/blosc
119+
120+
121+
Configuration parameters
122+
~~~~~~~~~~~~~~~~~~~~~~~~
123+
124+
cname:
125+
A string identifying the internal compression algorithm to be
126+
used. At the time of writing, the following values are supported
127+
by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd",
128+
"snappy", "zlib".
129+
130+
clevel:
131+
An integer between 1 and 9 indicating the compression level.
132+
133+
shuffle:
134+
An integer value in the set {0, 1, 2, -1}. A value of 1
135+
indicates that byte-wise shuffling is performed in addition to
136+
compression. A value of 2 indicates the bit-wise shuffling is
137+
performed in addition to compression. If a value of -1 is given,
138+
then default shuffling is used: bit-wise shuffling for buffers
139+
with item size of 1 byte, byte-wise shuffling otherwise.
140+
Shuffling is turned off completely when the value is 0.
141+
142+
blocksize:
143+
An integer giving the size in bytes of blocks into which a
144+
buffer is divided before compression.
145+
146+
For example, the array metadata document below specifies that the
147+
compressor is the Blosc codec configured with a compression level of
148+
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
149+
default block size::
150+
151+
{
152+
"compressor": {
153+
"codec": "https://purl.org/zarr/spec/codecs/blosc",
154+
"configuration": {
155+
"cname": "lz4",
156+
"clevel": 1,
157+
"shuffle": 1,
158+
"blocksize": 0
159+
}
160+
},
161+
}
162+
163+
164+
Format and algorithm
165+
~~~~~~~~~~~~~~~~~~~~
166+
167+
Blosc is a meta-compressor, which divides an input buffer into blocks,
168+
then applies an internal compression algorithm to each block, then
169+
packs the encoded blocks together into a single output buffer with a
170+
header. The format of the encoded buffer is defined in [BLOSC]_. The
171+
reference implementation is provided by the `c-blosc library
172+
<https://github.com/Blosc/c-blosc>`_.
173+
174+
175+
Deprecated codecs
176+
=================
177+
178+
There are no deprecated codecs at this time.
179+
180+
181+
References
182+
==========
183+
184+
.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
185+
Requirement Levels. March 1997. Best Current Practice. URL:
186+
https://tools.ietf.org/html/rfc2119
187+
188+
.. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version
189+
1.3. Requirement Levels. May 1996. Informational. URL:
190+
https://tools.ietf.org/html/rfc1951
191+
192+
.. [RFC1952] P. Deutsch. GZIP file format specification version 4.3.
193+
Requirement Levels. May 1996. Informational. URL:
194+
https://tools.ietf.org/html/rfc1952
195+
196+
.. [BLOSC] F. Alted. Blosc Chunk Format. URL:
197+
https://github.com/Blosc/c-blosc/blob/master/README_CHUNK_FORMAT.rst
198+
199+
200+
Change log
201+
==========
6202

7-
.. toctree::
8-
:maxdepth: 1
9-
:caption: Contents:
203+
Editor's Draft 21 October 2020
204+
------------------------------
10205

11-
codecs/gzip/v1.0
206+
* Added Gzip codec.
207+
* Added Blosc codec.

docs/codecs/gzip/v1.0.rst

Lines changed: 0 additions & 122 deletions
This file was deleted.

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ Under construction.
1010
:caption: Contents:
1111

1212
protocol
13-
stores
1413
codecs
14+
stores
1515

1616

1717
Indices and tables

0 commit comments

Comments
 (0)