Skip to content

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Sep 9, 2024

ZStandard was added via #103374 a few months ago to snapshot builds of Elasticsearch only and benchmark results have shown that using zstd is a better trade off compared to deflate for when index.codec is set to best_compression.

This change removes the feature flag for ZStandard stored field compression for indices with index.codec set to best_compression.

@martijnvg martijnvg added test-release Trigger CI checks against release build :StorageEngine/Codec labels Sep 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

@martijnvg
Copy link
Member Author

Note that release tests fail, because of esql tests that assume a feature flag is active.

@martijnvg martijnvg added the >docs General docs changes label Sep 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've updated the changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

@martijnvg martijnvg marked this pull request as ready for review September 11, 2024 07:49
@martijnvg martijnvg requested a review from a team as a code owner September 11, 2024 07:49
@martijnvg martijnvg requested a review from jpountz September 11, 2024 07:50
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-docs (Team:Docs)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've updated the changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

@martijnvg martijnvg removed the >docs General docs changes label Sep 11, 2024
@elasticsearchmachine elasticsearchmachine removed the Team:Docs Meta label for docs team label Sep 11, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've updated the changelog YAML for you. Note that since this PR is labelled release highlight, you need to update the changelog YAML to fill out the extended information sections.

Copy link
Contributor

@jpountz jpountz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woohoo!

indexing throughput (~2%).
have shown that `best_compression` gives up to ~28% lower storage usage and
better indexing throughput (up to ~10%) in the most ideal scenario compared
to `default` while affecting get by id latencies up to 10%. The higher get
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only 10% sounds surprising?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, maybe too optimistic. Based on tsdb get by id visualizations, this is a higher (at least 12%). Let me update this as well to be more realistic.

the most ideal scenario compared to `default` while only minimally affecting
indexing throughput (~2%).
have shown that `best_compression` gives up to ~28% lower storage usage and
better indexing throughput (up to ~10%) in the most ideal scenario compared
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that it can be faster at indexing than default in some cases via more batching, but also slower in other cases e.g. when index sorting is involved as some merging optimizations that copy data directly would no longer be applicable. I would rather rephase to suggest similar indexing rates, occasionally a bit slower or a bit faster depending on other options configured on the index?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 - I will reword this. This is based on comparing elastic/logs track without logsdb with default codec (lz4) codec to elastic/logs track without logsdb with best_compression (zstd)

Before DEFLATE compression was used to compress stored fields in indices with index.codec index setting set to
best_compression, with this change ZStandard is used as compression algorithm to stored fields for indices with
index.codec index setting set to best_compression. Experiments have shown that ZStandard offers lower storage usage
(upto ~12%) and higher indexing throughput (upto 14%) compared to DEFLATE.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also update the note on higher indexing throughput here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I reworded this to: 58e4898

@martijnvg
Copy link
Member Author

run elasticsearch-ci/release-tests

@martijnvg
Copy link
Member Author

Release build failed with: curl: (22) The requested URL returned error: 502, locally the zstd unit tests that are release build aware did pass. (./gradlew -p server test --tests "org.elasticsearch.index.codec*" -Dbuild.snapshot=false -Dtests.jvm.argline="-Dbuild.snapshot=false" -Dlicense.key=x-pack/license-tools/src/test/resources/public.key)

@martijnvg martijnvg merged commit 661efa9 into elastic:main Sep 13, 2024
14 of 16 checks passed
martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Sep 13, 2024
…12665)

ZStandard was added via elastic#103374 a few months ago to snapshot builds of Elasticsearch only and benchmark results have shown that using zstd is a better trade off compared to deflate for when index.codec is set to best_compression.

This change removes the feature flag for ZStandard stored field compression for indices with index.codec set to best_compression.
@elasticsearchmachine
Copy link
Collaborator

💚 Backport successful

Status Branch Result
8.x

elasticsearchmachine pushed a commit that referenced this pull request Sep 13, 2024
…2665) (#112857)

* Remove zstd feature flag for index codec best compression. (#112665)

ZStandard was added via #103374 a few months ago to snapshot builds of Elasticsearch only and benchmark results have shown that using zstd is a better trade off compared to deflate for when index.codec is set to best_compression.

This change removes the feature flag for ZStandard stored field compression for indices with index.codec set to best_compression.

* Update docs/changelog/112857.yaml

---------

Co-authored-by: Elastic Machine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants