-
Notifications
You must be signed in to change notification settings - Fork 115
Open
Description
Quantile Compression/PCodec is claiming 35%-71% better compression than zstd.
I've integrated the rust library into TurboPFor using the ffi bindings for comparison purpose.
We use the synthetic dataset provided in the Quantile Compression repository and other real data with large integers.
As real data with values larger than 32bits are not common, we use 32 bits integers when possible instead of 64 bits for all files. Note that some files can be better compressed by using delta or the integrated zigzag delta in conjunction with TurboTranspose. Download icapp, test with your own data and convince yourself.
- 32 bits integers:
Better compression and several times faster decompression with TurboTranspose+zstd
icapp i64*.txt -Ftu -e81 -Ezstd,22
size ratio E MB/s D MB/s function integer size=32 bits (lz=zstd,22)
450889 11.27% 19 6070 81:Lztp Byte Transpose +zstd,22 i64_cents.txt
182 0.0046% 239 28485 81:Lztp Byte Transpose +zstd,22 i64_constant.txt
631400 15.79% 5 5837 81:Lztp Byte Transpose +zstd,22 i64_dollars.txt
2693750 67.34% 12 4575 81:Lztp Byte Transpose +zstd,22 i64_geo1M.txt
251570 6.29% 14 7335 81:Lztp Byte Transpose +zstd,22 i64_geo2.txt
1028913 25.72% 12 2728 81:Lztp Byte Transpose +zstd,22 i64_interleaved.txt
1640375 41.01% 4 2774 81:Lztp Byte Transpose +zstd,22 i64_lomax15.txt
1592074 39.80% 12 3271 81:Lztp Byte Transpose +zstd,22 i64_lomax25.txt
2006291 50.16% 5 983 81:Lztp Byte Transpose +zstd,22 i64_misordered.txt
419053 10.48% 6 3915 81:Lztp Byte Transpose +zstd,22 i64_normal1.txt
815743 20.39% 8 3719 81:Lztp Byte Transpose +zstd,22 i64_normal10.txt
2898888 72.47% 6 3349 81:Lztp Byte Transpose +zstd,22 i64_normal1M.txt
404996 10.12% 5 3687 81:Lztp Byte Transpose +zstd,22 i64_slow_cosine.txt
16027 0.40% 8 20169 81:Lztp Byte Transpose +zstd,22 i64_sparse.txt
1411267 35.28% 5 2033 81:Lztp Byte Transpose +zstd,22 i64_total_cents.txt
16261417 Total
icapp i64*.txt -Ftu -e173
size ratio E MB/s D MB/s function integer size=32 bits
450451 11.26% 189 431 173:qcomp quantile compress i64_cents.txt
44 0.0011% 744 549 173:qcomp quantile compress i64_constant.txt
620064 15.50% 159 448 173:qcomp quantile compress i64_dollars.txt
2676957 66.92% 76 324 173:qcomp quantile compress i64_geo1M.txt
250467 6.26% 212 571 173:qcomp quantile compress i64_geo2.txt
2253101 56.33% 92 400 173:qcomp quantile compress i64_interleaved.txt
1575073 39.38% 98 373 173:qcomp quantile compress i64_lomax15.txt
1545171 38.63% 102 398 173:qcomp quantile compress i64_lomax25.txt
2253103 56.33% 78 313 173:qcomp quantile compress i64_misordered.txt
282581 7.06% 233 451 173:qcomp quantile compress i64_normal1.txt
676116 16.90% 161 452 173:qcomp quantile compress i64_normal10.txt
2754383 68.86% 74 336 173:qcomp quantile compress i64_normal1M.txt
221218 5.53% 269 534 173:qcomp quantile compress i64_slow_cosine.txt
14323 0.36% 718 2614 173:qcomp quantile compress i64_sparse.txt
1158386 28.96% 95 229 173:qcomp quantile compress i64_total_cents.txt
16731437 Total
- Floating point (64 bits):
Quantile Compresion/PCodec is slightly better but decompression is a lot slower (2-3x) than zstd
icapp f64*.txt -Ftd -e80 -Ezstd,22
size ratio E MB/s D MB/s function floating point size=64 bits (lz=zstd,22) unsorted -1
2412121 30.15% 3 1621 80:Lz zstd,22 f64_decimal_long.txt
9111 37.96% 7 913 80:Lz zstd,22 f64_decimal_short.txt
4970116 62.13% 5 1377 80:Lz zstd,22 f64_edge_cases.txt
4247812 53.10% 3 729 80:Lz zstd,22 f64_integers.txt
7670370 95.88% 8 1348 80:Lz zstd,22 f64_normal_at_0.txt
6221137 77.76% 3 1212 80:Lz zstd,22 f64_normal_at_1000.txt
4073918 50.92% 6 992 80:Lz zstd,22 f64_slow_cosine.txt
29604584 Total
icapp f64*.txt -Ftd -e173
size ratio E MB/s D MB/s function floating point size=64 bits
6686504 83.58% 189 605 173:qcomp quantile compress f64_decimal_long.txt
20134 83.89% 8 652 173:qcomp quantile compress f64_decimal_short.txt
4364570 54.56% 172 540 173:qcomp quantile compress f64_edge_cases.txt
3754251 46.93% 133 675 173:qcomp quantile compress f64_integers.txt
6943689 86.80% 131 518 173:qcomp quantile compress f64_normal_at_0.txt
5638910 70.49% 138 551 173:qcomp quantile compress f64_normal_at_1000.txt
1813077 22.66% 155 493 173:qcomp quantile compress f64_slow_cosine.txt
29221134 Total
- Timestamps (64 bits)
Quantile Compresion is slightly better but decompression is a lot slower (6x) than TurboTranspose+zstd
icapp micro*.* -FtT -e173
size ratio E MB/s D MB/s function integer size=64 bits
2497182 31.21% 140 640 173:qcomp quantile compress micros_millis.txt.ts
3742368 46.78% 195 793 173:qcomp quantile compress micros_near_linear.txt.ts
6239549
icapp micro*.* -FtT -e81 -Ezstd,22
size ratio E MB/s D MB/s function integer size=64 bits
3385201 42.32% 16 4089 81:Lztp Byte Transpose +zstd,22 micros_millis.txt.ts
2800155 35.00% 21 3367 81:Lztp Byte Transpose +zstd,22 micros_near_linear.txt.ts
6185355 Total
- Non synthetic dataset + lz77 offsets output. test1_demo (text) + test3_demo(binary). These are typical data for mixed small, medium and large integers.
As iccodec we use "zstd,15" and TurboVLC+"turborc,56" (only entropy coding w/ adaptive Asymmetric Numeral System)
Quantile compression is not competitive and the decompression is several (7 - 60) times slower.
TurboVLC+rANS compress better and compress/decompress faster.
icapp -Ezstd,15 CCNEWS-RLZ-D64-FLENS.txt -Ftu -e81,96,80,173,3
size ratio E MB/s D MB/s function integer size=32 bits
22145289 5.54% 29 3525 81:Lztp Byte Transpose +zstd,15
23693811 5.92% 32 2743 96:vlccomp TurboVLC +zstd,15
29382157 7.35% 9 3536 80:Lz zstd,15
59957497 14.99% 367 692 96:vlccomp TurboVLC +turborc,56 (=rANS)
62529619 15.63% 164 345 173:qcomp quantile compress
77585707 19.40% 1820 11324 3:p4nenc256v32 TurboPFor256
icapp -Ezstd,15 CCNEWS-RLZ-D64-FOFFSETS.txt -Ftu -e81,96,80,173,3
93751603 23.44% 19 2745 80:Lz zstd,15
283069853 70.77% 56 2622 96:vlccomp TurboVLC +zstd,15
322425616 80.61% 338 651 96:vlccomp TurboVLC +turborc,56 (=rANS)
323345103 80.84% 73 219 173:qcomp quantile compress
325331435 81.33% 2444 10740 3:p4nenc256v32 TurboPFor256
icapp -Ezstd,15 news-docs.2016-WORD.txt -Ftu -e81,96,80,173,3
142677882 35.67% 4 1444 80:Lz zstd,15
145450083 36.36% 37 1546 96:vlccomp TurboVLC +zstd,15
148119568 37.03% 11 1550 81:Lztp Byte Transpose +zstd,15
151616778 37.90% 313 605 96:vlccomp TurboVLC +turborc,56
189513565 47.38% 82 212 173:qcomp quantile compress
181946393 45.49% 1580 7641 3:p4nenc256v32 TurboPFor256
icapp -Ezstd,15 news-docs.2016-WORD-BWTMTF.txt -Ftu -e81,96,80,173,3
103706209 25.93% 306 558 96:vlccomp TurboVLC +turborc,56
105855336 26.46% 127 303 173:qcomp quantile compress
105872416 26.47% 29 1251 96:vlccomp TurboVLC +zstd,15
116101605 29.03% 11 1745 81:Lztp Byte Transpose +zstd,15
136893715 34.22% 4 1319 80:Lz zstd,15
135115053 33.78% 1561 9292 3:p4nenc256v32 TurboPFor256
icapp -Ezstd,15 test1_demo_o.u32 -e81,96,80,173,3
71858387 65.93% 332 650 96:vlccomp TurboVLC +turborc,56
72044472 66.10% 214 2036 96:vlccomp TurboVLC +zstd,15
72142814 66.19% 74 242 173:qcomp quantile compress
77852927 71.43% 7 1324 81:Lztp Byte Transpose +zstd,15
78282925 71.82% 1364 8745 3:p4nenc256v32 TurboPFor256
84333237 77.37% 6 1007 80:Lz zstd,15
icapp -Ezstd,15 test3_demo_o.u32 -e81,96,80,173,3
15946736 34.18% 11 13588 81:Lztp Byte Transpose +zstd,15
16182167 34.68% 8 1293 80:Lz zstd,15
17707120 37.95% 41 1807 96:vlccomp TurboVLC +zstd,15
17734852 38.01% 321 637 96:vlccomp TurboVLC +turborc,56
20344975 43.60% 97 226 173:qcomp quantile compress
22870847 49.01% 1500 8905 3:p4nenc256v32 TurboPFor256
Metadata
Metadata
Assignees
Labels
No labels