Benchmark: TurboTranspose+iccodecs vs Quantile Compression

[Quantile Compression/PCodec](https://github.com/mwlon/quantile-compression) is claiming 35%-71% better compression than zstd.

I've integrated the rust library into TurboPFor using the ffi bindings for comparison purpose.
We use the synthetic [dataset](https://github.com/mwlon/pcodec/tree/main/quantile-compression/q_compress/assets) provided in the Quantile Compression repository and other real data with large integers. 
As real data with values larger than 32bits are not common, we use 32 bits integers when possible instead of 64 bits for all files. Note that some files can be better compressed by using delta or the integrated zigzag delta in conjunction with TurboTranspose. Download icapp, test with your own data and convince yourself.

- 32 bits integers: 
Better compression and several times faster decompression with TurboTranspose+zstd
<pre>				
icapp i64*.txt -Ftu -e81 -Ezstd,22 
      size   ratio     E MB/s   D MB/s  function integer size=32 bits (lz=zstd,22)
    450889  11.27%         19     6070  81:Lztp   Byte      Transpose  +zstd,22         i64_cents.txt
       182   0.0046%      239    28485  81:Lztp   Byte      Transpose  +zstd,22         i64_constant.txt
    631400  15.79%          5     5837  81:Lztp   Byte      Transpose  +zstd,22         i64_dollars.txt
   2693750  67.34%         12     4575  81:Lztp   Byte      Transpose  +zstd,22         i64_geo1M.txt
    251570   6.29%         14     7335  81:Lztp   Byte      Transpose  +zstd,22         i64_geo2.txt
   1028913  25.72%         12     2728  81:Lztp   Byte      Transpose  +zstd,22         i64_interleaved.txt
   1640375  41.01%          4     2774  81:Lztp   Byte      Transpose  +zstd,22         i64_lomax15.txt
   1592074  39.80%         12     3271  81:Lztp   Byte      Transpose  +zstd,22         i64_lomax25.txt
   2006291  50.16%          5      983  81:Lztp   Byte      Transpose  +zstd,22         i64_misordered.txt
    419053  10.48%          6     3915  81:Lztp   Byte      Transpose  +zstd,22         i64_normal1.txt
    815743  20.39%          8     3719  81:Lztp   Byte      Transpose  +zstd,22         i64_normal10.txt
   2898888  72.47%          6     3349  81:Lztp   Byte      Transpose  +zstd,22         i64_normal1M.txt
    404996  10.12%          5     3687  81:Lztp   Byte      Transpose  +zstd,22         i64_slow_cosine.txt
     16027   0.40%          8    20169  81:Lztp   Byte      Transpose  +zstd,22         i64_sparse.txt
   1411267  35.28%          5     2033  81:Lztp   Byte      Transpose  +zstd,22         i64_total_cents.txt
  16261417  Total

icapp i64*.txt -Ftu -e173 
      size   ratio     E MB/s   D MB/s  function integer size=32 bits
    450451  11.26%        189      431  173:qcomp            quantile compress           i64_cents.txt
        44   0.0011%      744      549  173:qcomp            quantile compress           i64_constant.txt
    620064  15.50%        159      448  173:qcomp            quantile compress           i64_dollars.txt
   2676957  66.92%         76      324  173:qcomp            quantile compress           i64_geo1M.txt
    250467   6.26%        212      571  173:qcomp            quantile compress           i64_geo2.txt
   2253101  56.33%         92      400  173:qcomp            quantile compress           i64_interleaved.txt
   1575073  39.38%         98      373  173:qcomp            quantile compress           i64_lomax15.txt
   1545171  38.63%        102      398  173:qcomp            quantile compress           i64_lomax25.txt
   2253103  56.33%         78      313  173:qcomp            quantile compress           i64_misordered.txt
    282581   7.06%        233      451  173:qcomp            quantile compress           i64_normal1.txt
    676116  16.90%        161      452  173:qcomp            quantile compress           i64_normal10.txt
   2754383  68.86%         74      336  173:qcomp            quantile compress           i64_normal1M.txt
    221218   5.53%        269      534  173:qcomp            quantile compress           i64_slow_cosine.txt
     14323   0.36%        718     2614  173:qcomp            quantile compress           i64_sparse.txt
   1158386  28.96%         95      229  173:qcomp            quantile compress           i64_total_cents.txt
  16731437 Total
</pre>
- Floating point (64 bits):
Quantile Compresion/PCodec is slightly better but decompression is a lot slower (2-3x) than zstd 
<pre>
icapp f64*.txt -Ftd -e80 -Ezstd,22  
      size   ratio     E MB/s   D MB/s  function floating point size=64 bits (lz=zstd,22) unsorted -1
   2412121  30.15%          3     1621  80:Lz               zstd,22                     f64_decimal_long.txt
      9111  37.96%          7      913  80:Lz               zstd,22                     f64_decimal_short.txt
   4970116  62.13%          5     1377  80:Lz               zstd,22                     f64_edge_cases.txt
   4247812  53.10%          3      729  80:Lz               zstd,22                     f64_integers.txt
   7670370  95.88%          8     1348  80:Lz               zstd,22                     f64_normal_at_0.txt
   6221137  77.76%          3     1212  80:Lz               zstd,22                     f64_normal_at_1000.txt
   4073918  50.92%          6      992  80:Lz               zstd,22                     f64_slow_cosine.txt
  29604584 Total

icapp f64*.txt -Ftd -e173 
      size   ratio     E MB/s   D MB/s  function floating point size=64 bits
   6686504  83.58%        189      605  173:qcomp            quantile compress           f64_decimal_long.txt
     20134  83.89%          8      652  173:qcomp            quantile compress           f64_decimal_short.txt
   4364570  54.56%        172      540  173:qcomp            quantile compress           f64_edge_cases.txt
   3754251  46.93%        133      675  173:qcomp            quantile compress           f64_integers.txt
   6943689  86.80%        131      518  173:qcomp            quantile compress           f64_normal_at_0.txt
   5638910  70.49%        138      551  173:qcomp            quantile compress           f64_normal_at_1000.txt
   1813077  22.66%        155      493  173:qcomp            quantile compress           f64_slow_cosine.txt
  29221134 Total
</pre>
- Timestamps (64 bits)
Quantile Compresion is slightly better but decompression is a lot slower (6x) than TurboTranspose+zstd 
<pre>
icapp micro*.* -FtT -e173
      size   ratio     E MB/s   D MB/s  function integer size=64 bits
   2497182  31.21%        140      640  173:qcomp            quantile compress           micros_millis.txt.ts
   3742368  46.78%        195      793  173:qcomp            quantile compress           micros_near_linear.txt.ts
   6239549

icapp micro*.* -FtT -e81 -Ezstd,22
      size   ratio     E MB/s   D MB/s  function integer size=64 bits
   3385201  42.32%         16     4089  81:Lztp   Byte      Transpose  +zstd,22         micros_millis.txt.ts
   2800155  35.00%         21     3367  81:Lztp   Byte      Transpose  +zstd,22         micros_near_linear.txt.ts
   6185355 Total
</pre>

---

- Non synthetic [dataset](https://github.com/mpetri/ans-large-alphabet) + lz77 offsets output.  test1_demo (text) + test3_demo(binary). These are typical data for mixed small, medium and large integers.
As iccodec we use "zstd,15" and TurboVLC+"turborc,56" (only entropy coding w/ adaptive Asymmetric Numeral System)
Quantile compression is not competitive and the decompression is several (7 - 60) times slower.
TurboVLC+rANS compress better and compress/decompress faster.
<pre>
icapp -Ezstd,15  CCNEWS-RLZ-D64-FLENS.txt -Ftu -e81,96,80,173,3
      size   ratio     E MB/s   D MB/s   function integer size=32 bits
  22145289   5.54%         29     3525    81:Lztp   Byte      Transpose  +zstd,15         
  23693811   5.92%         32     2743    96:vlccomp          TurboVLC  +zstd,15          
  29382157   7.35%          9     3536    80:Lz               zstd,15                     
  59957497  14.99%        367      692    96:vlccomp          TurboVLC  +turborc,56 (=rANS)
  62529619  15.63%        164      345   173:qcomp            quantile compress          
  77585707  19.40%       1820    11324     3:p4nenc256v32     TurboPFor256                

icapp -Ezstd,15  CCNEWS-RLZ-D64-FOFFSETS.txt -Ftu -e81,96,80,173,3
  93751603  23.44%         19     2745    80:Lz               zstd,15                     
 283069853  70.77%         56     2622    96:vlccomp          TurboVLC  +zstd,15         
 322425616  80.61%        338      651    96:vlccomp          TurboVLC  +turborc,56 (=rANS)
 323345103  80.84%         73      219   173:qcomp            quantile compress           
 325331435  81.33%       2444    10740     3:p4nenc256v32     TurboPFor256                

icapp -Ezstd,15  news-docs.2016-WORD.txt -Ftu -e81,96,80,173,3
 142677882  35.67%          4     1444    80:Lz               zstd,15                     
 145450083  36.36%         37     1546    96:vlccomp          TurboVLC  +zstd,15          
 148119568  37.03%         11     1550    81:Lztp   Byte      Transpose  +zstd,15        
 151616778  37.90%        313      605    96:vlccomp          TurboVLC  +turborc,56
 189513565  47.38%         82      212   173:qcomp            quantile compress           
 181946393  45.49%       1580     7641     3:p4nenc256v32     TurboPFor256                

icapp -Ezstd,15  news-docs.2016-WORD-BWTMTF.txt -Ftu -e81,96,80,173,3
 103706209  25.93%        306      558    96:vlccomp          TurboVLC  +turborc,56
 105855336  26.46%        127      303   173:qcomp            quantile compress           
 105872416  26.47%         29     1251    96:vlccomp          TurboVLC  +zstd,15          
 116101605  29.03%         11     1745    81:Lztp   Byte      Transpose  +zstd,15         
 136893715  34.22%          4     1319    80:Lz               zstd,15                    
 135115053  33.78%       1561     9292     3:p4nenc256v32     TurboPFor256                

icapp -Ezstd,15  test1_demo_o.u32 -e81,96,80,173,3
  71858387  65.93%        332      650   96:vlccomp          TurboVLC  +turborc,56
  72044472  66.10%        214     2036   96:vlccomp          TurboVLC  +zstd,15         
  72142814  66.19%         74      242  173:qcomp            quantile compress           
  77852927  71.43%          7     1324   81:Lztp   Byte      Transpose  +zstd,15         
  78282925  71.82%       1364     8745    3:p4nenc256v32     TurboPFor256                
  84333237  77.37%          6     1007   80:Lz               zstd,15                    

icapp -Ezstd,15  test3_demo_o.u32 -e81,96,80,173,3
  15946736  34.18%         11    13588   81:Lztp   Byte      Transpose  +zstd,15         
  16182167  34.68%          8     1293   80:Lz               zstd,15                     
  17707120  37.95%         41     1807   96:vlccomp          TurboVLC  +zstd,15          
  17734852  38.01%        321      637   96:vlccomp          TurboVLC  +turborc,56
  20344975  43.60%         97      226  173:qcomp            quantile compress          
  22870847  49.01%       1500     8905    3:p4nenc256v32     TurboPFor256                
</pre>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark: TurboTranspose+iccodecs vs Quantile Compression #100

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark: TurboTranspose+iccodecs vs Quantile Compression #100

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions