-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Lucene, and by extension OpenSearch, currently support only zlib and lz4 compression algorithms offering best compression ratio and best speed, respectively. There are a couple of limitations with this:
- Use cases that prefer a middle ground between best compression ratio and best speed, such as that offered by zstd, are not supported.
- The lz4 implementation is written entirely in Java and, based on our benchmark results, is less performant than a native lz4 implementation.
To address the limitations, we have implemented a small library that adds support for both zstd and lz4 (native) compressions to OpenSearch. The library requires a couple of source code changes to OpenSearch, EngineConfig.java and CodecService.java, but does not require any code changes to Lucene.
The charts below summarize the relative performance gain we measured for the different compression algorithms and implementations using the StoredFieldsBenchmark.
Indexing Time (relative, less is better):
Retrieval Time (relative, less is better):
Stored size (relative, less is better):
We have also used the OpenSearch-Benchmark with the nyc_taxis workload to compare the indexing performance of the different compression algorithms and implementations above and measured decent gains as well.
We'll be upstreaming our source code and patches in a couple of weeks and discuss on how best we can integrate this into OpenSearch and make them available for users. This post is intended to start the discussion.