Skip to content

Commit c69da91

Browse files
authored
Merge pull request #102 from alimanfoo/refactor-codecs-20201021
Refactor codec specs into a single doc
2 parents c0728dc + 2314192 commit c69da91

File tree

3 files changed

+210
-129
lines changed

3 files changed

+210
-129
lines changed

docs/codecs.rst

Lines changed: 209 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,214 @@
1-
======
1+
==============
2+
Codec registry
3+
==============
4+
------------------------------
5+
Editor's Draft 21 October 2020
6+
------------------------------
7+
8+
Specification URI:
9+
https://purl.org/zarr/specs/codec
10+
Issue tracking:
11+
`GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_
12+
Suggest an edit for this spec:
13+
`GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/master/docs/codecs.rst>`_
14+
15+
Copyright 2020 `Zarr core development team
16+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work
17+
is licensed under a `Creative Commons Attribution 3.0 Unported License
18+
<https://creativecommons.org/licenses/by/3.0/>`_.
19+
20+
----
21+
22+
23+
Abstract
24+
========
25+
26+
This document defines codecs for use as compressors and/or filters as
27+
part of a Zarr implementation.
28+
29+
30+
Status of this documents
31+
========================
32+
33+
This document is a **Work in Progress**. It may be updated, replaced
34+
or obsoleted by other documents at any time. It is inappapropriate to
35+
cite this document as other than work in progress.
36+
37+
Comments, questions or contributions to this document are very
38+
welcome. Comments and questions should be raised via `GitHub issues
39+
<https://github.com/zarr-developers/zarr-specs/labels/codec>`_.
40+
41+
This document is maintained by the `Zarr core development team
42+
<https://github.com/orgs/zarr-developers/teams/core-devs>`_.
43+
44+
45+
Document conventions
46+
====================
47+
48+
This document lists a collection of codecs. For each codec, the
49+
following information is provided:
50+
51+
* A URI which can be used to uniquely identify the codec in Zarr array
52+
metadata.
53+
* Any configuration parameters which can be set in Zarr array
54+
metadata.
55+
* A definition of encoding/decoding algorithm and the encoded format,
56+
or a citation to an existing specification where this is defined.
57+
* Any additional headers added to the encoded data.
58+
59+
Conformance requirements are expressed with a combination of
60+
descriptive assertions and [RFC2119]_ terminology. The key words
61+
"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
62+
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative
63+
parts of this document are to be interpreted as described in
64+
[RFC2119]_. However, for readability, these words do not appear in all
65+
uppercase letters in this specification.
66+
67+
All of the text of this specification is normative except sections
68+
explicitly marked as non-normative, examples, and notes. Examples in
69+
this specification are introduced with the words "for example".
70+
71+
272
Codecs
373
======
474

5-
Under construction.
75+
Gzip
76+
----
77+
78+
Codec URI:
79+
https://purl.org/zarr/spec/codec/gzip
80+
81+
82+
Configuration parameters
83+
~~~~~~~~~~~~~~~~~~~~~~~~
84+
85+
level:
86+
An integer from 0 to 9 which controls the speed and level of
87+
compression. A level of 1 is the fastest compression method and
88+
produces the least compressions, while 9 is slowest and produces
89+
the most compression. Compression is turned off completely when
90+
level is 0.
91+
92+
For example, the array metadata below specifies that the compressor is
93+
the Gzip codec configured with a compression level of 1::
94+
95+
{
96+
"compressor": {
97+
"codec": "https://purl.org/zarr/spec/codec/gzip",
98+
"configuration": {
99+
"level": 1
100+
}
101+
},
102+
}
103+
104+
105+
Format and algorithm
106+
~~~~~~~~~~~~~~~~~~~~
107+
108+
Encoding and decoding is performed using the algorithm defined in
109+
[RFC1951]_.
110+
111+
Encoded data should conform to the Gzip file format [RFC1952]_.
112+
113+
114+
Blosc
115+
-----
116+
117+
Codec URI:
118+
https://purl.org/zarr/spec/codec/blosc
119+
120+
121+
Configuration parameters
122+
~~~~~~~~~~~~~~~~~~~~~~~~
123+
124+
cname:
125+
A string identifying the internal compression algorithm to be
126+
used. At the time of writing, the following values are supported
127+
by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd",
128+
"snappy", "zlib".
129+
130+
clevel:
131+
An integer from 0 to 9 which controls the speed and level of
132+
compression. A level of 1 is the fastest compression method and
133+
produces the least compressions, while 9 is slowest and produces
134+
the most compression. Compression is turned off completely when
135+
level is 0.
136+
137+
shuffle:
138+
An integer value in the set {0, 1, 2, -1} indicating the way
139+
bytes or bits are rearranged, which can lead to faster
140+
and/or greater compression. A value of 1
141+
indicates that byte-wise shuffling is performed prior to
142+
compression. A value of 2 indicates the bit-wise shuffling is
143+
performed prior to compression. If a value of -1 is given,
144+
then default shuffling is used: bit-wise shuffling for buffers
145+
with item size of 1 byte, byte-wise shuffling otherwise.
146+
Shuffling is turned off completely when the value is 0.
147+
148+
blocksize:
149+
An integer giving the size in bytes of blocks into which a
150+
buffer is divided before compression. A value of 0
151+
indicates that an automatic size will be used.
152+
153+
For example, the array metadata document below specifies that the
154+
compressor is the Blosc codec configured with a compression level of
155+
1, byte-wise shuffling, the ``lz4`` compression algorithm and the
156+
default block size::
157+
158+
{
159+
"compressor": {
160+
"codec": "https://purl.org/zarr/spec/codec/blosc",
161+
"configuration": {
162+
"cname": "lz4",
163+
"clevel": 1,
164+
"shuffle": 1,
165+
"blocksize": 0
166+
}
167+
},
168+
}
169+
170+
171+
Format and algorithm
172+
~~~~~~~~~~~~~~~~~~~~
173+
174+
Blosc is a meta-compressor, which divides an input buffer into blocks,
175+
then applies an internal compression algorithm to each block, then
176+
packs the encoded blocks together into a single output buffer with a
177+
header. The format of the encoded buffer is defined in [BLOSC]_. The
178+
reference implementation is provided by the `c-blosc library
179+
<https://github.com/Blosc/c-blosc>`_.
180+
181+
182+
Deprecated codecs
183+
=================
184+
185+
There are no deprecated codecs at this time.
186+
187+
188+
References
189+
==========
190+
191+
.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate
192+
Requirement Levels. March 1997. Best Current Practice. URL:
193+
https://tools.ietf.org/html/rfc2119
194+
195+
.. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version
196+
1.3. Requirement Levels. May 1996. Informational. URL:
197+
https://tools.ietf.org/html/rfc1951
198+
199+
.. [RFC1952] P. Deutsch. GZIP file format specification version 4.3.
200+
Requirement Levels. May 1996. Informational. URL:
201+
https://tools.ietf.org/html/rfc1952
202+
203+
.. [BLOSC] F. Alted. Blosc Chunk Format. URL:
204+
https://github.com/Blosc/c-blosc/blob/master/README_CHUNK_FORMAT.rst
205+
206+
207+
Change log
208+
==========
6209

7-
.. toctree::
8-
:maxdepth: 1
9-
:caption: Contents:
210+
Editor's Draft 21 October 2020
211+
------------------------------
10212

11-
codecs/gzip/v1.0
213+
* Added Gzip codec.
214+
* Added Blosc codec.

docs/codecs/gzip/v1.0.rst

Lines changed: 0 additions & 122 deletions
This file was deleted.

docs/index.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ Under construction.
1010
:caption: Contents:
1111

1212
protocol
13-
stores
1413
codecs
14+
stores
1515

1616

1717
Indices and tables

0 commit comments

Comments
 (0)