Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
"CRUD",
"Client",
"Cluster Coordination",
"Codec",
"Data streams",
"DLM",
"Discovery-Plugins",
Expand Down
13 changes: 13 additions & 0 deletions docs/changelog/112665.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
pr: 112665
summary: Remove zstd feature flag for index codec best compression
area: Codec
type: enhancement
issues: []
highlight:
title: Enable ZStandard compression for indices with index.codec set to best_compression
body: |-
Before DEFLATE compression was used to compress stored fields in indices with index.codec index setting set to
best_compression, with this change ZStandard is used as compression algorithm to stored fields for indices with
index.codec index setting set to best_compression. Experiments have shown that ZStandard offers lower storage usage
(upto ~12%) and higher indexing throughput (upto 14%) compared to DEFLATE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also update the note on higher indexing throughput here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I reworded this to: 58e4898

notable: true
2 changes: 1 addition & 1 deletion docs/reference/ilm/actions/ilm-forcemerge.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Number of segments to merge to. To fully merge the index, set to `1`.
`index_codec`::
(Optional, string)
Codec used to compress the document store. The only accepted value is
`best_compression`, which uses {wikipedia}/DEFLATE[DEFLATE] for a higher
`best_compression`, which uses {wikipedia}/Zstd[ZSTD] for a higher
compression ratio but slower stored fields performance. To use the default LZ4
codec, omit this argument.
+
Expand Down
12 changes: 7 additions & 5 deletions docs/reference/index-modules.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -76,14 +76,16 @@ breaking change].

The +default+ value compresses stored data with LZ4
compression, but this can be set to +best_compression+
which uses {wikipedia}/DEFLATE[DEFLATE] for a higher
compression ratio, at the expense of slower stored fields performance.
which uses {wikipedia}/Zstd[ZSTD] for a higher
compression ratio, at the expense of slower stored fields read performance.
If you are updating the compression type, the new one will be applied
after segments are merged. Segment merging can be forced using
<<indices-forcemerge,force merge>>. Experiments with indexing log datasets
have shown that `best_compression` gives up to ~18% lower storage usage in
the most ideal scenario compared to `default` while only minimally affecting
indexing throughput (~2%).
have shown that `best_compression` gives up to ~28% lower storage usage and
better indexing throughput (up to ~10%) in the most ideal scenario compared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that it can be faster at indexing than default in some cases via more batching, but also slower in other cases e.g. when index sorting is involved as some merging optimizations that copy data directly would no longer be applicable. I would rather rephase to suggest similar indexing rates, occasionally a bit slower or a bit faster depending on other options configured on the index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - I will reword this. This is based on comparing elastic/logs track without logsdb with default codec (lz4) codec to elastic/logs track without logsdb with best_compression (zstd)

to `default` while affecting get by id latencies up to 10%. The higher get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 10% sounds surprising?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe too optimistic. Based on tsdb get by id visualizations, this is a higher (at least 12%). Let me update this as well to be more realistic.

by id latencies is not a concern for many use cases like logging or metrics, since
these don't really rely on get by id functionality (Get APIs or searching by _id).

[[index-mode-setting]] `index.mode`::
+
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -53,15 +53,11 @@ public CodecService(@Nullable MapperService mapperService, BigArrays bigArrays)
}
codecs.put(LEGACY_DEFAULT_CODEC, legacyBestSpeedCodec);

codecs.put(
BEST_COMPRESSION_CODEC,
new PerFieldMapperCodec(Zstd814StoredFieldsFormat.Mode.BEST_COMPRESSION, mapperService, bigArrays)
);
Codec legacyBestCompressionCodec = new LegacyPerFieldMapperCodec(Lucene99Codec.Mode.BEST_COMPRESSION, mapperService, bigArrays);
if (ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled()) {
codecs.put(
BEST_COMPRESSION_CODEC,
new PerFieldMapperCodec(Zstd814StoredFieldsFormat.Mode.BEST_COMPRESSION, mapperService, bigArrays)
);
} else {
codecs.put(BEST_COMPRESSION_CODEC, legacyBestCompressionCodec);
}
codecs.put(LEGACY_BEST_COMPRESSION_CODEC, legacyBestCompressionCodec);

codecs.put(LUCENE_DEFAULT_CODEC, Codec.getDefault());
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,6 @@
public class CodecIntegrationTests extends ESSingleNodeTestCase {

public void testCanConfigureLegacySettings() {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());

createIndex("index1", Settings.builder().put("index.codec", "legacy_default").build());
var codec = client().admin().indices().prepareGetSettings("index1").execute().actionGet().getSetting("index1", "index.codec");
assertThat(codec, equalTo("legacy_default"));
Expand All @@ -29,8 +27,6 @@ public void testCanConfigureLegacySettings() {
}

public void testDefaultCodecLogsdb() {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());

var indexService = createIndex("index1", Settings.builder().put("index.mode", "logsdb").build());
var storedFieldsFormat = (Zstd814StoredFieldsFormat) indexService.getShard(0)
.getEngineOrNull()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,6 @@ public void testDefault() throws Exception {
}

public void testBestCompression() throws Exception {
assumeTrue("Only when zstd_stored_fields feature flag is enabled", CodecService.ZSTD_STORED_FIELDS_FEATURE_FLAG.isEnabled());
Codec codec = createCodecService().codec("best_compression");
assertEquals(
"Zstd814StoredFieldsFormat(compressionMode=ZSTD(level=3), chunkSize=245760, maxDocsPerChunk=2048, blockShift=10)",
Expand Down