You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Kanzi is a modern, modular, portable, and efficient lossless data compressor written in C++.
3
4
4
-
Kanzi is a modern, modular, portable and efficient lossless data compressor implemented in C++.
5
+
* Modern: Kanzi implements state-of-the-art compression algorithms and is built to fully utilize multi-core CPUs via built-in multi-threading.
6
+
* Modular: Entropy codecs and data transforms can be selected and combined at runtime to best suit the specific data being compressed.
7
+
* Portable: Supports a wide range of operating systems, compilers, and C++ standards (details below).
8
+
* Expandable: A clean, interface-driven design—with no external dependencies—makes Kanzi easy to integrate, extend, and customize.
9
+
* Efficient: Carefully optimized to balance compression ratio and speed for practical, high-performance usage.
5
10
6
-
* modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-threading.
7
-
* modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
8
-
* portable: many OSes, compilers and C++ versions are supported (see below).
9
-
* expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
10
-
* efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).
11
+
Unlike most mainstream lossless compressors, Kanzi is not limited to a single compression paradigm. By combining multiple algorithms and techniques, it supports a broader range of compression ratios and adapts better to diverse data types.
11
12
12
-
Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanziis concurrent by design and uses threads to compress several blocks in parallel. It is not compatible with standard compression formats.
13
+
Most traditional compressors underutilize modern hardware by running single-threaded—even on machines with many cores. Kanzi, in contrast, is concurrent by design, compressing multiple blocks in parallel across threads for significant performance gains. However, it is not compatible with standard compression formats.
13
14
14
-
Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files. However, Kanzi generates a bitstream that is seekable (one or several consecutive blocks can be decompressed without the need for the whole bitstream to be decompressed).
15
+
It’s important to note that Kanzi is a data compressor, not an archiver. It includes optional checksums for verifying data integrity, but does not provide features like cross-file deduplication or data recovery mechanisms. That said, it produces a seekable bitstream—meaning one or more consecutive blocks can be decompressed independently, without needing to process the entire stream.
15
16
16
17
For more details, see [Wiki](https://github.com/flanglet/kanzi/wiki), [Q&A](https://github.com/flanglet/kanzi/wiki/q&a) and [DeepWiki](https://deepwiki.com/flanglet/kanzi-cpp/1-overview)
17
18
@@ -34,44 +35,37 @@ There is a Go implementation available here: https://github.com/flanglet/kanzi-g
34
35
35
36
## Why Kanzi
36
37
37
-
There are many excellent, open-source lossless data compressors available already.
38
+
There are already many excellent, open-source lossless data compressors available.
38
39
39
-
If gzip is starting to show its age, zstd and brotli are open-source, standardized and used
40
-
daily by millions of people. Zstd is incredibly fast and probably the best choice in many cases.
41
-
There are a few scenarios where Kanzi can be a better choice:
40
+
If gzip is beginning to show its age, modern alternatives like **zstd** and **brotli** offer compelling replacements. Both are open-source, standardized, and used daily by millions. **Zstd** is especially notable for its exceptional speed and is often the best choice in general-purpose compression.
42
41
43
-
- gzip, lzma, brotli, zstd are all LZ based. It means that they can reach certain compression
44
-
ratios only. Kanzi also makes use of BWT and CM which can compress beyond what LZ can do.
42
+
However, there are scenarios where **Kanzi** may offer superior performance:
45
43
46
-
- These LZ based compressors are well suited for software distribution (one compression / many decompressions)
47
-
due to their fast decompression (but low compression speed at high compression ratios).
48
-
There are other scenarios where compression speed is critical: when data is generated before being compressed and consumed
49
-
(one compression / one decompression) or during backups (many compressions / one decompression).
44
+
While gzip, LZMA, brotli, and zstd are all based on LZ (Lempel-Ziv) compression, they are inherently limited in the compression ratios they can achieve. **Kanzi** goes further by incorporating **BWT (Burrows-Wheeler Transform)** and **CM (Context Modeling)**, which can outperform traditional LZ-based methods in certain cases.
50
45
51
-
- Kanzi has built-in customized data transforms (multimedia, utf, text, dna, ...) that can be chosen and combined
52
-
at compression time to better compress specific kinds of data.
46
+
LZ-based compressors are ideal for software distribution, where data is compressed once and decompressed many times, thanks to their fast decompression speeds—though they tend to be slower when compressing at higher ratios. But in other scenarios—such as real-time data generation, one-off data transfers, or backups—**compression speed becomes critical**. Here, Kanzi can shine.
53
47
54
-
- Kanzi can take advantage of the multiple cores of a modern CPU to improve performance
48
+
**Kanzi** also features a suite of built-in, customizable data transforms tailored for specific data types (e.g., multimedia, UTF text, DNA, etc.), which can be selectively applied during compression for better efficiency.
49
+
50
+
Furthermore, Kanzi is designed to **leverage modern multi-core CPUs** to boost performance.
51
+
52
+
Finally, **extensibility** is a key strength: implementing new transforms or entropy codecs—whether for experimentation or to improve performance on niche data types—is straightforward and developer-friendly.
55
53
56
-
- Implementing a new transform or entropy codec (to either test an idea or improve compression ratio on specific kinds of data) is simple.
0 commit comments