Skip to content

Commit 486db6c

Browse files
committed
First implementation of variable length blocks in chunks
1 parent 3122c09 commit 486db6c

File tree

12 files changed

+1222
-56
lines changed

12 files changed

+1222
-56
lines changed

README_CFRAME_FORMAT.rst

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,9 @@ frame. It is mandatory the use of the msgpack format for storing them, although
119119
:``6``:
120120
Chunks of fixed length (0) or variable length (1)
121121
:``7``:
122-
Reserved
122+
All chunks in the frame use variable-length blocks (1) or regular blocks (0)
123+
124+
Frames must not mix regular chunks and variable-length-block chunks.
123125

124126
:frame_type:
125127
(``uint8``) The type of frame.

README_CHUNK_FORMAT.rst

Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ for encoding blocks with a filter pipeline::
4343
| filters | ^ | ^ | filters_meta | ^ | ^ |
4444
| | | |
4545
| +- compcode_meta | +-blosc2_flags
46-
+- user-defined codec +-reserved
46+
+- user-defined codec +-blosc2_flags2
4747

4848
:version:
4949
(``uint8``) Blosc format version.
@@ -100,7 +100,9 @@ for encoding blocks with a filter pipeline::
100100
(``int32``) Uncompressed size of the buffer (this header is not included).
101101

102102
:blocksize:
103-
(``int32``) Size of internal blocks.
103+
(``int32``) Size of internal blocks for regular chunks.
104+
105+
When `blosc2_flags2` bit 0 is set, this field stores the number of blocks in the chunk instead.
104106

105107
:cbytes:
106108
(``int32``) Compressed size of the buffer (including this header).
@@ -174,6 +176,14 @@ for encoding blocks with a filter pipeline::
174176
:bit 7 (``0x80``):
175177
Indicate whether codec has been instrumented or not.
176178

179+
:blosc2_flags2:
180+
(``bitfield``) Secondary flags for a Blosc2 buffer.
181+
182+
:bit 0 (``0x01``):
183+
Whether the chunk uses variable-length blocks or not.
184+
:bits 1 to 7:
185+
Reserved.
186+
177187

178188
Blocks
179189
------
@@ -185,8 +195,14 @@ compression, and finally a list of compressed data streams::
185195
| bstarts | dict | streams |
186196
+=========+======+=========+
187197

188-
Each block is equal-sized as specified by the `blocksize` header field. The size of the last block can be shorter
189-
or equal to the rest.
198+
For regular chunks, each block is equal-sized as specified by the `blocksize` header field. The size of the last
199+
block can be shorter or equal to the rest.
200+
201+
When `blosc2_flags2` bit 0 is set, the chunk uses variable-length blocks instead:
202+
203+
- `blocksize` in the header stores the number of blocks
204+
- each block still has one entry in `bstarts`
205+
- each block is stored in a single compressed stream
190206

191207
**Block starts**
192208

@@ -211,8 +227,9 @@ The dictionary section contains the size of the dictionary `int32_t dsize` follo
211227

212228
**Compressed Data Streams**
213229

214-
Compressed data streams are the compressed set of bytes that are passed to codecs for decompression. Each compressed
215-
data stream (`uint8_t* cdata`) is stored with the size of the stream (`int32_t csize`) preceding it::
230+
Compressed data streams are the compressed set of bytes that are passed to codecs for decompression. For regular
231+
chunks, each compressed data stream (`uint8_t* cdata`) is stored with the size of the stream (`int32_t csize`)
232+
preceding it::
216233

217234
+=======+=======+
218235
| csize | cdata |
@@ -255,6 +272,20 @@ If bit 4 of the `flags` header is *not* set, each block can be stored using mult
255272
The uncompressed size for each block is equivalent to the `blocksize` field in the header, with the exception
256273
of the last block which may be equal to or less than the `blocksize`.
257274

275+
For variable-length-block chunks (`blosc2_flags2` bit 0 set), each block is always stored in a single stream::
276+
277+
+=========+
278+
| stream0 |
279+
+=========+
280+
| block0 |
281+
+=========+
282+
283+
In this variant:
284+
285+
- `csize` stores the uncompressed size of the block
286+
- the compressed size is derived from adjacent entries in `bstarts` and the end of the chunk
287+
- the special `csize == 0` and `csize < 0` encodings are not used
288+
258289
Trailer
259290
-------
260291

0 commit comments

Comments
 (0)