-
Notifications
You must be signed in to change notification settings - Fork 7
Bitstream Format
This document defines the bitstream format for the Kanzi lossless data compressor. A Kanzi bitstream can be either standard (containing a global header) or headless.
This document specifies the Kanzi bitstream format, which defines how compressed data produced by the Kanzi lossless data compressor is serialized into a binary representation suitable for storage or transmission.
A bitstream is an ordered sequence of bits that is interpreted sequentially from beginning to end. In Kanzi, the bitstream represents a complete compressed data stream and is decoded strictly in forward order.
A Kanzi bitstream is composed of the following elements, in order:
- An optional global header, present only in standard mode
- A sequence of one or more compressed blocks
- A mandatory End of Stream marker
In standard mode, the global header appears once at the beginning of the bitstream and fully specifies all parameters required to decode the subsequent blocks, including the entropy codec, the list of transforms, the block size, and the checksum type.
In headless mode, the global header is omitted. In this case, the decoder must be configured externally with the same parameters that would otherwise be provided by the header. The structure and interpretation of the compressed blocks are identical in both modes.
Each block is compressed independently and can be decoded without reference to other blocks, except for the shared configuration defined by the global header or the external parameters in headless mode. This block-based design enables streaming operation, parallel decoding, and partial recovery in the presence of data corruption.
Unless explicitly stated otherwise, all multi-byte numeric fields in the bitstream are stored in Big-Endian order, and all numeric values are unsigned.
Except in headless mode, the global header is present once at the beginning of the stream.
All multi-byte values are stored in Big-Endian order.
| Bits | Name | Value | Description |
| 32 | Magic Number | 0x4B414E5A | 'KANZ' constant |
| 4 | bsVersion | 6 | Bitstream format version |
| 2 | chkSize | 0..2 | Block checksum: 0=None, 1=32-bit, 2=64-bit, 3=Reserved |
| 5 | entropyType | 0..31 | Entropy codec identifier (see Section 5.1) |
| 48 | transformType | - | Eight 6-bit transform identifiers (see Section 5.2) |
| 28 | blockSize | - | Block size divided by 16 (1024 to 1GB) |
| 2 | szMask | 0..3 | Size of the optional output size field |
| 0,16,32,48 | outputSize | - | Original uncompressed size (present only if szMask != 0) |
| 15 | Padding | 0 | Reserved for future use (must be 0) |
| 24 | Checksum | - | Header verification checksum |
The _outputSize field is present only if szMask is non-zero:
szMask = 0: no output size field
szMask = 1: 16-bit output size
szMask = 2: 32-bit output size
szMask = 3: 48-bit output size
The output size represents the total uncompressed size in bytes.
The 24-bit header checksum is calculated as follows:
uint32 seed = 0x01030507 * uint32(bsVersion);
const uint32 HASH = 0x1E35A7BD;
uint32 cksum = HASH * seed;
cksum ^= (HASH * uint32( ~ chkSize));
cksum ^= (HASH * uint32( ~ _entropyType));
cksum ^= (HASH * uint32(( ~ _transformType) >> 32));
cksum ^= (HASH * uint32( ~ _transformType));
cksum ^= (HASH * uint32( ~ _blockSize));
if (szMask != 0) {
cksum ^= (HASH * uint32(( ~ _outputSize) >> 32));
cksum ^= (HASH * uint32( ~ _outputSize));
}
cksum = (cksum >> 23) ^ (cksum >> 3);
cksum &= 0xFFFFFF;After the global header, one or more blocks follow.
Each block is independently encoded and decoded.
| Bits | Name | Description |
| 5 | logSize | Value L = log2(compressed_block_size) - 3 |
| L + 3 | cbs | Compressed block size in bits. If 0, this signals End of Stream |
| Bits | Name | Description |
| 8 | Mode | Block flags and pre-transform size descriptor (see 3.3) |
| 0 or 8 | SkipFlags | Present only if bit 4 of Mode is 1 |
| 8 * ps | DataSize | Encoded size of data before transforms. ps = 1 + ((Mode >> 5) & 0x03) bytes |
| 0,32,64 | BlockChecksum | XXHash32 or XXHash64 of decompressed data (seed = 0x4B414E5A) |
The Mode byte is bit-packed as follows:
Bit 7: Copy block flag If set, the block data is stored verbatim (no entropy coding or transforms).
Bits 6-5: Pre-transform size field length minus 1 (values 0 to 3)
Bit 4: Skip flags location
1: Skip flags are read as the next 8 bits in the bitstream
0: Skip flags are derived from the lower nibble of the Mode byte
Bits 3-0: Inline skip flags for the first four transforms
A bit set to 1 indicates that the corresponding transform is skipped.
Skip Flags Resolution:
If Bit 4 is 0: FinalSkipFlags = (Mode << 4) | 0x0F (Transforms 5 to 8 are implicitly skipped)
If Bit 4 is 1: FinalSkipFlags is the explicitly read 8-bit SkipFlags value
For each block:
Read the block header and extract the compressed block size.
If cbs == 0, stop decoding (End of Stream).
Read the Mode byte and determine the FinalSkipFlags.
If Bit 7 of Mode is set (Copy Block):
Copy the block payload directly to the output.
Otherwise (Compressed Block):
Entropy-decode the data using the codec specified in the header.
Apply inverse transforms sequentially from last to first, skipping transforms whose corresponding bit in FinalSkipFlags is set.
If chkSize > 0 (a block checksum is available), compute the XXHash checksum of the decompressed block and compare it with the BlockChecksum stored in the bitstream.
| ID | Name |
| 0 | NONE |
| 1 | HUFFMAN |
| 2 | FPAQ |
| 3 | PAQ (obsolete) |
| 4 | RANGE |
| 5 | ANS0 |
| 6 | CM |
| 7 | TPAQ |
| 8 | ANS1 |
| 9 | TPAQX |
| ID | Name |
| 0 | NONE |
| 1 | BWT |
| 2 | BWTS |
| 3 | LZ |
| 4 | Snappy (obsolete) |
| 5 | RLT |
| 6 | ZRLT |
| 7 | MTFT |
| 8 | RANK |
| 9 | EXE |
| 10 | DICT |
| 11 | ROLZ |
| 12 | ROLZX |
| 13 | SRT |
| 14 | LZP |
| 15 | MM |
| 16 | LZX |
| 17 | UTF |
| 18 | PACK |
| 19 | DNA |
The bitstream must terminate with an empty block.
This is encoded as a single null byte (0x00), representing 5 zero bits for logSize and 3 zero bits for cbs.