-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Allow opting out of force-merging on a cloned index in ILM's searchable snapshot action #137375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…le snapshot action In elastic#133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards. We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by elastic#137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team). Therefore, we implement an opt-out flag that users can configure in the `searchable_snapshot` action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default.
|
Pinging @elastic/es-data-management (Team:Data Management) |
|
Hi @nielsbauman, I've created a changelog YAML for you. |
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces an opt-out mechanism for the force-merge-on-clone optimization in ILM's searchable snapshot action. The optimization, which clones an index with 0 replicas before force-merging to avoid redundant work on replicas, can now be disabled via a new force_merge_on_clone field when issues arise with cloning (such as shard assignment problems near disk watermarks).
Key Changes:
- Added a new
force_merge_on_cloneboolean field toSearchableSnapshotActionthat defaults to true (maintaining current behavior) - Modified the step execution logic to conditionally skip clone-related steps when
force_merge_on_cloneis false - Added validation to ensure
force_merge_on_cloneis only set whenforce_merge_indexis true
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| SearchableSnapshotAction.java | Core implementation adding the force_merge_on_clone field with validation and step execution logic |
| SearchableSnapshotActionTests.java | Updated unit tests to cover the new field in serialization, mutation, and step key generation |
| SearchableSnapshotActionIT.java | Added integration test validating behavior when opting out of force-merge-on-clone |
| RestPutLifecycleAction.java | Added capability flag for the new feature |
| 90_searchable_snapshot.yml | Added YAML REST test verifying the new field can be set and retrieved |
| ilm-searchable-snapshot.md | Added documentation for the new force_merge_on_clone parameter |
| TimeseriesLifecycleTypeTests.java, LifecyclePolicyTests.java | Updated test utilities to pass null for the new parameter |
| transport version files | Added new transport version for backward compatibility |
| 137375.yaml | Added changelog entry |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SearchableSnapshotAction.java
Outdated
Show resolved
Hide resolved
| } | ||
| this.replicateFor = replicateFor; | ||
|
|
||
| if (forceMergeIndex == false && forceMergeOnClone != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't ready through the whole PR yet so I might be missing something, but what if forceMergeOnClone is false? We don't want to throw an exception here do we? Is there any reason for forceMergeOnClone to be a Boolean rather than a boolean? Is there any real value in it being able to have 3 states instead of 2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I was a bit on the fence here. My reason for going with a Boolean was that it better represents what the user actually does (null, false, true). However, we don't currently make a lot of use of that null state, so I don't have super strong feelings on this. If you strongly prefer having a boolean, I'm ok with changing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense with the toXcontent not returning it when null. But is this a bug here that you can't say force_merge_on_clone: false and force_merge_index: false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I forgot to comment on that. I intentionally went with that behavior, as I thought it was cleaner and less confusing if force_merge_on_clone is simply not supported when force_merge_index: false. It wouldn't have any effect if you were able to specify it, but I think it's confusing to allow configuring fields that are ignored. This way, it's clear that the force_merge_on_clone field is not relevant/supported. What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems a bit odd to me, but it sort of makes sense.
| if (replicateFor != null) { | ||
| builder.field(REPLICATE_FOR.getPreferredName(), replicateFor); | ||
| } | ||
| if (forceMergeOnClone != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this answers my question from above -- we don't want it in the xcontent if it has not been set explicitly.
docs/reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot.md
Outdated
Show resolved
Hide resolved
...gin/ilm/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/ilm/90_searchable_snapshot.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…le snapshot action (elastic#137375) In elastic#133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards. We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by elastic#137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team). Therefore, we implement an opt-out flag that users can configure in the `searchable_snapshot` action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default. (cherry picked from commit 0ab3240) # Conflicts: # server/src/main/resources/transport/upper_bounds/9.2.csv # server/src/main/resources/transport/upper_bounds/9.3.csv # x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/RestPutLifecycleAction.java
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
…le snapshot action (elastic#137375) In elastic#133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards. We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by elastic#137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team). Therefore, we implement an opt-out flag that users can configure in the `searchable_snapshot` action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default. (cherry picked from commit 0ab3240) # Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv # x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/RestPutLifecycleAction.java
…archable snapshot action (#137375) (#137463) * Allow opting out of force-merging on a cloned index in ILM's searchable snapshot action (#137375) In #133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards. We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by #137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team). Therefore, we implement an opt-out flag that users can configure in the `searchable_snapshot` action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default. (cherry picked from commit 0ab3240) # Conflicts: # server/src/main/resources/transport/upper_bounds/9.3.csv # x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/RestPutLifecycleAction.java * Fix transport version
In #133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards.
We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by #137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team).
Therefore, we implement an opt-out flag that users can configure in the
searchable_snapshotaction of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default.