You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Use previous chunks to inform compressions decisions (#2724)
Followup of #2723, re-encoding array children according to a known
well-compressed chunk from the `BtrBlockCompressor`.
Current
[benchmarks](#2724 (comment))
seem to show that compress throughput is up, with decompression slightly
slower and query times are up by up to 10%.
Two interesting points are that wide columned arrays are basically
always slower to compress but decompression held-up better. Arade is
also the one dataset where compressed files shrunk significantly,
regardless of throughput.
I think these tradeoffs should be configurable, and I assume that
increasing the tolerance (currently hard coded at 20%) for compression
ratio drift will allow users to control this knob in a way that better
fits their usecases.
Other benchmarks (because there are so many commits/comments on this
PR):
- [Clickbench on
NVME](#2724 (comment))
- [TPCH on
NVME](#2724 (comment))
- [Random
Access](#2724 (comment))
- [TPC-H on
S3](#2724 (comment))
- Note that it doesn't actually generate new files here, so the results
aren't useful.
---------
Co-authored-by: Nicholas Gates <[email protected]>
Co-authored-by: Robert Kruszewski <[email protected]>
0 commit comments