Skip to content

Commit 453a671

Browse files
authored
Update README.md
Release 2.4 numbers
1 parent 7f2b6da commit 453a671

File tree

1 file changed

+57
-114
lines changed

1 file changed

+57
-114
lines changed

README.md

Lines changed: 57 additions & 114 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
# Kanzi
22

3+
Kanzi is a modern, modular, portable, and efficient lossless data compressor written in C++.
34

4-
Kanzi is a modern, modular, portable and efficient lossless data compressor implemented in C++.
5+
* Modern: Kanzi implements state-of-the-art compression algorithms and is built to fully utilize multi-core CPUs via built-in multi-threading.
6+
* Modular: Entropy codecs and data transforms can be selected and combined at runtime to best suit the specific data being compressed.
7+
* Portable: Supports a wide range of operating systems, compilers, and C++ standards (details below).
8+
* Expandable: A clean, interface-driven design—with no external dependencies—makes Kanzi easy to integrate, extend, and customize.
9+
* Efficient: Carefully optimized to balance compression ratio and speed for practical, high-performance usage.
510

6-
* modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-threading.
7-
* modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
8-
* portable: many OSes, compilers and C++ versions are supported (see below).
9-
* expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
10-
* efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).
11+
Unlike most mainstream lossless compressors, Kanzi is not limited to a single compression paradigm. By combining multiple algorithms and techniques, it supports a broader range of compression ratios and adapts better to diverse data types.
1112

12-
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is concurrent by design and uses threads to compress several blocks in parallel. It is not compatible with standard compression formats.
13+
Most traditional compressors underutilize modern hardware by running single-threaded—even on machines with many cores. Kanzi, in contrast, is concurrent by design, compressing multiple blocks in parallel across threads for significant performance gains. However, it is not compatible with standard compression formats.
1314

14-
Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files. However, Kanzi generates a bitstream that is seekable (one or several consecutive blocks can be decompressed without the need for the whole bitstream to be decompressed).
15+
It’s important to note that Kanzi is a data compressor, not an archiver. It includes optional checksums for verifying data integrity, but does not provide features like cross-file deduplication or data recovery mechanisms. That said, it produces a seekable bitstream—meaning one or more consecutive blocks can be decompressed independently, without needing to process the entire stream.
1516

1617
For more details, see [Wiki](https://github.com/flanglet/kanzi/wiki), [Q&A](https://github.com/flanglet/kanzi/wiki/q&a) and [DeepWiki](https://deepwiki.com/flanglet/kanzi-cpp/1-overview)
1718

@@ -34,44 +35,37 @@ There is a Go implementation available here: https://github.com/flanglet/kanzi-g
3435

3536
## Why Kanzi
3637

37-
There are many excellent, open-source lossless data compressors available already.
38+
There are already many excellent, open-source lossless data compressors available.
3839

39-
If gzip is starting to show its age, zstd and brotli are open-source, standardized and used
40-
daily by millions of people. Zstd is incredibly fast and probably the best choice in many cases.
41-
There are a few scenarios where Kanzi can be a better choice:
40+
If gzip is beginning to show its age, modern alternatives like **zstd** and **brotli** offer compelling replacements. Both are open-source, standardized, and used daily by millions. **Zstd** is especially notable for its exceptional speed and is often the best choice in general-purpose compression.
4241

43-
- gzip, lzma, brotli, zstd are all LZ based. It means that they can reach certain compression
44-
ratios only. Kanzi also makes use of BWT and CM which can compress beyond what LZ can do.
42+
However, there are scenarios where **Kanzi** may offer superior performance:
4543

46-
- These LZ based compressors are well suited for software distribution (one compression / many decompressions)
47-
due to their fast decompression (but low compression speed at high compression ratios).
48-
There are other scenarios where compression speed is critical: when data is generated before being compressed and consumed
49-
(one compression / one decompression) or during backups (many compressions / one decompression).
44+
While gzip, LZMA, brotli, and zstd are all based on LZ (Lempel-Ziv) compression, they are inherently limited in the compression ratios they can achieve. **Kanzi** goes further by incorporating **BWT (Burrows-Wheeler Transform)** and **CM (Context Modeling)**, which can outperform traditional LZ-based methods in certain cases.
5045

51-
- Kanzi has built-in customized data transforms (multimedia, utf, text, dna, ...) that can be chosen and combined
52-
at compression time to better compress specific kinds of data.
46+
LZ-based compressors are ideal for software distribution, where data is compressed once and decompressed many times, thanks to their fast decompression speeds—though they tend to be slower when compressing at higher ratios. But in other scenarios—such as real-time data generation, one-off data transfers, or backups—**compression speed becomes critical**. Here, Kanzi can shine.
5347

54-
- Kanzi can take advantage of the multiple cores of a modern CPU to improve performance
48+
**Kanzi** also features a suite of built-in, customizable data transforms tailored for specific data types (e.g., multimedia, UTF text, DNA, etc.), which can be selectively applied during compression for better efficiency.
49+
50+
Furthermore, Kanzi is designed to **leverage modern multi-core CPUs** to boost performance.
51+
52+
Finally, **extensibility** is a key strength: implementing new transforms or entropy codecs—whether for experimentation or to improve performance on niche data types—is straightforward and developer-friendly.
5553

56-
- Implementing a new transform or entropy codec (to either test an idea or improve compression ratio on specific kinds of data) is simple.
5754

5855

5956
## Benchmarks
6057

6158
Test machine:
6259

63-
AWS c5a8xlarge: AMD EPYC 7R32 (32 vCPUs), 64 GB RAM
64-
65-
Ubuntu clang++ version 15.0.7 + tcmalloc
60+
Apple M3 24 GB Sonoma 14.6.1
6661

67-
Ubuntu 24.04 LTS
62+
Kanzi version 2.4.0 C++ implementation
6863

69-
Kanzi version 2.3.0 C++ implementation
64+
On this machine, Kanzi uses 4 threads (half of CPUs by default).
7065

71-
On this machine, Kanzi uses up to 16 threads (half of CPUs by default).
66+
bzip3 runs with 4threads.
7267

73-
bzip3 and zpaq use 16 threads.
74-
zstd uses 16 threads for compression and 1 for decompression, other compressors are single threaded.
68+
zstd and lz4 use 4 threads for compression and 1 for decompression, other compressors are single threaded.
7569

7670
The default block size at level 9 is 32MB, severely limiting the number of threads
7771
in use, especially with enwik8, but all tests are performed with default values.
@@ -84,100 +78,49 @@ Download at http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
8478
| Compressor | Encoding (sec) | Decoding (sec) | Size |
8579
|---------------------------------|-----------------|-----------------|------------------|
8680
|Original | | | 211,957,760 |
87-
|**Kanzi -l 1** | **0.263** | **0.231** | **80,277,212** |
88-
|Lz4 1.9.5 -4 | 0.321 | 0.330 | 79,912,419 |
89-
|Zstd 1.5.6 -2 -T16 | 0.151 | 0.271 | 69,556,157 |
90-
|**Kanzi -l 2** | **0.267** | **0.253** | **68,195,845** |
91-
|Brotli 1.1.0 -2 | 1.749 | 0.761 | 68,041,629 |
92-
|Gzip 1.12 -9 | 20.09 | 1.403 | 67,652,449 |
93-
|**Kanzi -l 3** | **0.446** | **0.287** | **65,613,695** |
94-
|Zstd 1.5.6 -5 -T16 | 0.356 | 0.289 | 63,131,656 |
95-
|**Kanzi -l 4** | **0.543** | **0.373** | **61,249,959** |
96-
|Zstd 1.5.5 -9 -T16 | 0.690 | 0.278 | 59,429,335 |
97-
|Brotli 1.1.0 -6 | 8.388 | 0.677 | 58,571,909 |
98-
|Zstd 1.5.6 -13 -T16 | 3.244 | 0.272 | 58,041,112 |
99-
|Brotli 1.1.0 -9 | 70.07 | 0.761 | 56,376,419 |
100-
|Bzip2 1.0.8 -9 | 16.94 | 6.734 | 54,572,500 |
101-
|**Kanzi -l 5** | **1.627** | **0.883** | **54,039,773** |
102-
|Zstd 1.5.6 -19 -T16 | 20.87 | 0.303 | 52,889,925 |
103-
|**Kanzi -l 6** | **2.312** | **1.227** | **49,567,817** |
104-
|Lzma 5.4.5 -9 | 95.97 | 3.172 | 48,745,354 |
105-
|**Kanzi -l 7** | **2.686** | **2.553** | **47,520,629** |
106-
|bzip3 1.3.2.r4-gb2d61e8 -j 16 | 2.682 | 3.221 | 47,237,088 |
107-
|**Kanzi -l 8** | **7.260** | **8.021** | **43,167,429** |
108-
|**Kanzi -l 9** | **18.99** | **21.07** | **41,497,835** |
109-
|zpaq 7.15 -m5 -t16 | 213.8 | 213.8 | 40,050,429 |
81+
|**kanzi -l 1** | **461** | **252** | 80,245,856 |
82+
|lz4 1.1.10 -T4 -4 | 527 | 121 | 79,919,901 |
83+
|zstd 1.5.8 -T4 -2 | 147 | 150 | 69,410,383 |
84+
|**kanzi -l 2** | **326** | **270** | 68,860,099 |
85+
|brotli 1.1.0 -2 | 907 | 402 | 68,039,159 |
86+
|Apple gzip 430.140.2 -9 | 10406 | 273 | 67,648,481 |
87+
|**kanzi -l 3** | **684** | **344** | 64,266,936 |
88+
|zstd 1.5.8 -T4 -5 | 300 | 154 | 62,851,716 |
89+
|**kanzi -l 4** | **802** | **463** | 61,131,554 |
90+
|zstd 1.5.8 -T4 -9 | 752 | 137 | 59,190,090 |
91+
|brotli 1.1.0 -6 | 3596 | 340 | 58,557,128 |
92+
|zstd 1.5.8 -T4 -13 | 4537 | 138 | 57,814,719 |
93+
|brotli 1.1.0 -9 | 19809 | 329 | 56,414,012 |
94+
|bzip2 1.0.8 -9 | 9673 | 3140 | 54,602,583 |
95+
|**kanzi -l 5** | **2087** | **1248** | 54,025,588 |
96+
|zstd 1.5.8 -T4 -19 | 20482 | 151 | 52,858,610 |
97+
|**kanzi -l 6** | **3065** | **2329** | 49,521,392 |
98+
|xz 5.8.1 -9 | 48516 | 1594 | 48,774,000 |
99+
|bzip3 1.5.1.r3-g428f422 | 8559 | 3948 | 47,256,794 |
100+
|**kanzi -l 7** | **3798** | **3298** | 47,312,772 |
101+
|**kanzi -l 8** | **15272** | **16419** | 43,260,254 |
102+
|**kanzi -l 9** | **20972** | **22375** | 41,858,886 |
110103

111104

112105
### enwik8
113106

114107
Download at https://mattmahoney.net/dc/enwik8.zip
115108

116-
Tested on Ubuntu 22.04.4 LTS, i7-7700K CPU @ 4.20GHz, 32 GB RAM, clang-15, 4 threads (default)
109+
Apple M3 24 GB Sonoma 14.6.1
117110

118111
| Compressor | Encoding (ms) | Decoding (ms) | Size |
119112
|-----------------|----------------|----------------|--------------|
120113
|Original | | | 100,000,000 |
121-
|Kanzi -l 1 | 251 | 87 | 43,746,017 |
122-
|Kanzi -l 2 | 268 | 114 | 37,816,913 |
123-
|Kanzi -l 3 | 512 | 175 | 33,865,383 |
124-
|Kanzi -l 4 | 546 | 249 | 29,597,577 |
125-
|Kanzi -l 5 | 1030 | 500 | 26,528,023 |
126-
|Kanzi -l 6 | 1537 | 799 | 24,076,674 |
127-
|Kanzi -l 7 | 2695 | 2045 | 22,817,373 |
128-
|Kanzi -l 8 | 7217 | 7314 | 21,181,983 |
129-
|Kanzi -l 9 | 11336 | 11574 | 20,035,138 |
130-
131-
132-
133-
### Round-trip scores for LZ
134-
135-
Below is a table showing silesia.tar compressed using different LZ compressors (no entropy) in single-threaded mode.
136-
137-
The efficiency score is computed as such: score(lambda) = compTime + 2 x decompTime + 10^-lambda x compSize
138-
139-
A lower score is better. Best scores are in bold.
140-
141-
Tested on Ubuntu 22.04.4 LTS, i7-7700K CPU @ 4.20GHz, 32 GB RAM, clang-15
142-
143-
| Compressor | Encoding (sec) | Decoding (sec) | Size | Score(5) | Score(6) | Score(7) |
144-
|--------------------------|----------------|-----------------|------------------|------------|------------|------------|
145-
|FastLZ -2 | 1.85 | 0.84 | 101114153 | 1014.66 | 104.63 | 13.63 |
146-
|Lizard 1.1.0 -11 | 0.76 | 0.24 | 93967850 | 940.91 | 95.20 | 10.63 |
147-
|Lz4 1.9.5 -2 -T1 | 0.81 | 0.21 | 89208908 | 893.32 | 90.44 | 10.15 |
148-
|Lzturbo 1.2 -11 -p0 | 1.09 | 0.34 | 88657053 | 888.35 | 90.43 | 10.64 |
149-
|lzav (1) | 0.52 | 0.19 | 88221200 | 883.12 | 89.13 | 9.73 |
150-
|s2 -cpu 1 | 0.81 | 0.40 | 86646819 | 868.08 | 88.25 | 10.27 |
151-
|LZ4x 1.60 -2 | 1.13 | 0.22 | 87883674 | 880.40 | 89.44 | 10.35 |
152-
|lzav (2) | 0.67 | TBD | 86505609 | | | |
153-
|Lizard 1.1.0 -12 | 1.48 | 0.23 | 86340434 | 865.35 | 88.29 | 10.58 |
154-
|LZ4x 1.60 -3 | 1.36 | 0.24 | 85483806 | 856.67 | 87.32 | 10.38 |
155-
|Kanzi 2.3 -t lz -j 1 (1) | 0.83 | 0.24 | 83355862 | 834.87 | 84.67 | ***9.65*** |
156-
|Lzturbo 1.2 -12 -p0 | 2.40 | 0.22 | 83179291 | 834.63 | 86.02 | 11.16 |
157-
|Kanzi 2.3 -t lz -j 1 (2) | 0.99 | 0.35 | 82652955 | 828.22 | 84.34 | 9.96 |
158-
|Kanzi 2.3 -t lzx -j 1 (1) | 1.09 | 0.22 | 81485228 | 816.39 | 83.02 | 9.68 |
159-
|Lz4 1.9.5 -3 -T1 | 2.33 | 0.21 | 81441623 | 817.17 | 84.19 | 10.90 |
160-
|Kanzi 2.3 -t lzx -j 1 (2) | 1.52 | 0.35 | 79014650 |***792.37***|***81.23*** | 10.12 |
161-
162-
References:
163-
164-
[FastLZ](https://github.com/ariya/FastLZ)
165-
[Lizard](https://github.com/inikep/lizard)
166-
[LZ4](https://github.com/lz4/lz4)
167-
[S2](https://github.com/klauspost/compress)
168-
[LZAV](https://github.com/avaneev/lzav)
169-
[LZ4x](https://github.com/tomsim/lz4x)
170-
[LZTurbo](https://sites.google.com/site/powturbo)
171-
172-
lz4@97291fc50
173-
174-
kanzi@af12d07f2
175-
176-
lzav@10f7e2ac
177-
178-
(1) processing 4MB blocks
114+
|kanzi -l 1 | 271 | 135 | 43,644,013 |
115+
|kanzi -l 2 | 196 | 142 | 37,570,404 |
116+
|kanzi -l 3 | 350 | 200 | 32,466,232 |
117+
|kanzi -l 4 | 372 | 249 | 29,536,517 |
118+
|kanzi -l 5 | 720 | 478 | 26,523,940 |
119+
|kanzi -l 6 | 1053 | 807 | 24,076,765 |
120+
|kanzi -l 7 | 1704 | 1416 | 22,817,360 |
121+
|kanzi -l 8 | 6544 | 6988 | 21,181,992 |
122+
|kanzi -l 9 | 8621 | 9090 | 20,035,840 |
179123

180-
(2) processing whole file at once
181124

182125

183126
### More benchmarks
@@ -218,7 +161,7 @@ targets. Build successfully on MacOs with several versions of clang++.
218161
Multithreading is supported.
219162

220163
### BSD
221-
The makefile uses the gnu-make syntax. First, make sure gmake is present (or install it: 'pkg_add gmake').
164+
The makefile uses the gnu-make syntax. First, make sure gmake is present (or install it: 'pkg install gmake').
222165
Go to the source directory and run 'gmake clean && gmake kanzi'. The Makefile contains all the necessary
223166
targets. Multithreading is supported.
224167

0 commit comments

Comments
 (0)