|
1 | | -====== |
| 1 | +============== |
| 2 | +Codec registry |
| 3 | +============== |
| 4 | +------------------------------ |
| 5 | +Editor's Draft 21 October 2020 |
| 6 | +------------------------------ |
| 7 | + |
| 8 | +Specification URI: |
| 9 | + https://purl.org/zarr/specs/codec |
| 10 | +Issue tracking: |
| 11 | + `GitHub issues <https://github.com/zarr-developers/zarr-specs/labels/codec>`_ |
| 12 | +Suggest an edit for this spec: |
| 13 | + `GitHub editor <https://github.com/zarr-developers/zarr-specs/blob/master/docs/codecs.rst>`_ |
| 14 | + |
| 15 | +Copyright 2020 `Zarr core development team |
| 16 | +<https://github.com/orgs/zarr-developers/teams/core-devs>`_. This work |
| 17 | +is licensed under a `Creative Commons Attribution 3.0 Unported License |
| 18 | +<https://creativecommons.org/licenses/by/3.0/>`_. |
| 19 | + |
| 20 | +---- |
| 21 | + |
| 22 | + |
| 23 | +Abstract |
| 24 | +======== |
| 25 | + |
| 26 | +This document defines codecs for use as compressors and/or filters as |
| 27 | +part of a Zarr implementation. |
| 28 | + |
| 29 | + |
| 30 | +Status of this documents |
| 31 | +======================== |
| 32 | + |
| 33 | +This document is a **Work in Progress**. It may be updated, replaced |
| 34 | +or obsoleted by other documents at any time. It is inappapropriate to |
| 35 | +cite this document as other than work in progress. |
| 36 | + |
| 37 | +Comments, questions or contributions to this document are very |
| 38 | +welcome. Comments and questions should be raised via `GitHub issues |
| 39 | +<https://github.com/zarr-developers/zarr-specs/labels/codec>`_. |
| 40 | + |
| 41 | +This document is maintained by the `Zarr core development team |
| 42 | +<https://github.com/orgs/zarr-developers/teams/core-devs>`_. |
| 43 | + |
| 44 | + |
| 45 | +Document conventions |
| 46 | +==================== |
| 47 | + |
| 48 | +This document lists a collection of codecs. For each codec, the |
| 49 | +following information is provided: |
| 50 | + |
| 51 | +* A URI which can be used to uniquely identify the codec in Zarr array |
| 52 | + metadata. |
| 53 | +* Any configuration parameters which can be set in Zarr array |
| 54 | + metadata. |
| 55 | +* A definition of encoding/decoding algorithm and the encoded format, |
| 56 | + or a citation to an existing specification where this is defined. |
| 57 | +* Any additional headers added to the encoded data. |
| 58 | + |
| 59 | +Conformance requirements are expressed with a combination of |
| 60 | +descriptive assertions and [RFC2119]_ terminology. The key words |
| 61 | +"MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", |
| 62 | +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative |
| 63 | +parts of this document are to be interpreted as described in |
| 64 | +[RFC2119]_. However, for readability, these words do not appear in all |
| 65 | +uppercase letters in this specification. |
| 66 | + |
| 67 | +All of the text of this specification is normative except sections |
| 68 | +explicitly marked as non-normative, examples, and notes. Examples in |
| 69 | +this specification are introduced with the words "for example". |
| 70 | + |
| 71 | + |
2 | 72 | Codecs |
3 | 73 | ====== |
4 | 74 |
|
5 | | -Under construction. |
| 75 | +Gzip |
| 76 | +---- |
| 77 | + |
| 78 | +Codec URI: |
| 79 | + https://purl.org/zarr/spec/codec/gzip |
| 80 | + |
| 81 | + |
| 82 | +Configuration parameters |
| 83 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 84 | + |
| 85 | +level: |
| 86 | + An integer from 0 to 9 which controls the speed and level of |
| 87 | + compression. A level of 1 is the fastest compression method and |
| 88 | + produces the least compressions, while 9 is slowest and produces |
| 89 | + the most compression. Compression is turned off completely when |
| 90 | + level is 0. |
| 91 | + |
| 92 | +For example, the array metadata below specifies that the compressor is |
| 93 | +the Gzip codec configured with a compression level of 1:: |
| 94 | + |
| 95 | + { |
| 96 | + "compressor": { |
| 97 | + "codec": "https://purl.org/zarr/spec/codec/gzip", |
| 98 | + "configuration": { |
| 99 | + "level": 1 |
| 100 | + } |
| 101 | + }, |
| 102 | + } |
| 103 | + |
| 104 | + |
| 105 | +Format and algorithm |
| 106 | +~~~~~~~~~~~~~~~~~~~~ |
| 107 | + |
| 108 | +Encoding and decoding is performed using the algorithm defined in |
| 109 | +[RFC1951]_. |
| 110 | + |
| 111 | +Encoded data should conform to the Gzip file format [RFC1952]_. |
| 112 | + |
| 113 | + |
| 114 | +Blosc |
| 115 | +----- |
| 116 | + |
| 117 | +Codec URI: |
| 118 | + https://purl.org/zarr/spec/codec/blosc |
| 119 | + |
| 120 | + |
| 121 | +Configuration parameters |
| 122 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 123 | + |
| 124 | +cname: |
| 125 | + A string identifying the internal compression algorithm to be |
| 126 | + used. At the time of writing, the following values are supported |
| 127 | + by the c-blosc library: "lz4", "lz4hc", "blosclz", "zstd", |
| 128 | + "snappy", "zlib". |
| 129 | + |
| 130 | +clevel: |
| 131 | + An integer from 0 to 9 which controls the speed and level of |
| 132 | + compression. A level of 1 is the fastest compression method and |
| 133 | + produces the least compressions, while 9 is slowest and produces |
| 134 | + the most compression. Compression is turned off completely when |
| 135 | + level is 0. |
| 136 | + |
| 137 | +shuffle: |
| 138 | + An integer value in the set {0, 1, 2, -1} indicating the way |
| 139 | + bytes or bits are rearranged, which can lead to faster |
| 140 | + and/or greater compression. A value of 1 |
| 141 | + indicates that byte-wise shuffling is performed prior to |
| 142 | + compression. A value of 2 indicates the bit-wise shuffling is |
| 143 | + performed prior to compression. If a value of -1 is given, |
| 144 | + then default shuffling is used: bit-wise shuffling for buffers |
| 145 | + with item size of 1 byte, byte-wise shuffling otherwise. |
| 146 | + Shuffling is turned off completely when the value is 0. |
| 147 | + |
| 148 | +blocksize: |
| 149 | + An integer giving the size in bytes of blocks into which a |
| 150 | + buffer is divided before compression. A value of 0 |
| 151 | + indicates that an automatic size will be used. |
| 152 | + |
| 153 | +For example, the array metadata document below specifies that the |
| 154 | +compressor is the Blosc codec configured with a compression level of |
| 155 | +1, byte-wise shuffling, the ``lz4`` compression algorithm and the |
| 156 | +default block size:: |
| 157 | + |
| 158 | + { |
| 159 | + "compressor": { |
| 160 | + "codec": "https://purl.org/zarr/spec/codec/blosc", |
| 161 | + "configuration": { |
| 162 | + "cname": "lz4", |
| 163 | + "clevel": 1, |
| 164 | + "shuffle": 1, |
| 165 | + "blocksize": 0 |
| 166 | + } |
| 167 | + }, |
| 168 | + } |
| 169 | + |
| 170 | + |
| 171 | +Format and algorithm |
| 172 | +~~~~~~~~~~~~~~~~~~~~ |
| 173 | + |
| 174 | +Blosc is a meta-compressor, which divides an input buffer into blocks, |
| 175 | +then applies an internal compression algorithm to each block, then |
| 176 | +packs the encoded blocks together into a single output buffer with a |
| 177 | +header. The format of the encoded buffer is defined in [BLOSC]_. The |
| 178 | +reference implementation is provided by the `c-blosc library |
| 179 | +<https://github.com/Blosc/c-blosc>`_. |
| 180 | + |
| 181 | + |
| 182 | +Deprecated codecs |
| 183 | +================= |
| 184 | + |
| 185 | +There are no deprecated codecs at this time. |
| 186 | + |
| 187 | + |
| 188 | +References |
| 189 | +========== |
| 190 | + |
| 191 | +.. [RFC2119] S. Bradner. Key words for use in RFCs to Indicate |
| 192 | + Requirement Levels. March 1997. Best Current Practice. URL: |
| 193 | + https://tools.ietf.org/html/rfc2119 |
| 194 | +
|
| 195 | +.. [RFC1951] P. Deutsch. DEFLATE Compressed Data Format Specification version |
| 196 | + 1.3. Requirement Levels. May 1996. Informational. URL: |
| 197 | + https://tools.ietf.org/html/rfc1951 |
| 198 | +
|
| 199 | +.. [RFC1952] P. Deutsch. GZIP file format specification version 4.3. |
| 200 | + Requirement Levels. May 1996. Informational. URL: |
| 201 | + https://tools.ietf.org/html/rfc1952 |
| 202 | +
|
| 203 | +.. [BLOSC] F. Alted. Blosc Chunk Format. URL: |
| 204 | + https://github.com/Blosc/c-blosc/blob/master/README_CHUNK_FORMAT.rst |
| 205 | +
|
| 206 | +
|
| 207 | +Change log |
| 208 | +========== |
6 | 209 |
|
7 | | -.. toctree:: |
8 | | - :maxdepth: 1 |
9 | | - :caption: Contents: |
| 210 | +Editor's Draft 21 October 2020 |
| 211 | +------------------------------ |
10 | 212 |
|
11 | | - codecs/gzip/v1.0 |
| 213 | +* Added Gzip codec. |
| 214 | +* Added Blosc codec. |
0 commit comments