-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ILM: Force merge on zero-replica cloned index before snapshot #133954
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ILM: Force merge on zero-replica cloned index before snapshot #133954
Conversation
When performing a searchable snapshot action with force merge enabled, if the source index has one or more replicas, ILM now clones the index with zero replicas and performs the force merge on the clone. The snapshot is then taken from the force-merged clone instead of the source index, ensuring only primary shards are force-merged. The cloned index is deleted after the snapshot is mounted, and all references and step logic have been updated accordingly. Test coverage was added for the new flow, including handling retries and cleanup of failed clones. Key changes: - Execution state: Track the force-merged clone index in ILM state and propagate through relevant APIs. - SearchableSnapshotAction: Add conditional steps to clone the index with 0 replicas, force-merge, and delete the clone as needed. - Steps: Update ForceMerge, SegmentCount, Snapshot, and Delete steps to operate on the correct index (source or clone). - Tests/QA: Add and enhance tests to verify force-merge and snapshot behavior with and without replicas, including retry/cleanup paths and configuration for stable force-merges. Resolves elastic#75478
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a new ILM behavior for searchable snapshot actions with force merge enabled. When the source index has one or more replicas, ILM now clones the index with zero replicas and performs the force merge on the clone instead of the original index. This optimization avoids unnecessarily force-merging replica shards since snapshots only capture primary shards.
- Force merge optimization by cloning indices with replicas to avoid merging unnecessary replica shards
- New execution state tracking for force-merged clone indices throughout the ILM lifecycle
- Enhanced test coverage including failure scenarios and retry mechanisms
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
TransportExplainLifecycleAction.java | Adds force merge index name to ILM explain response |
SearchableSnapshotActionIT.java | Comprehensive test coverage for new clone-based force merge behavior |
TimeSeriesRestDriver.java | Utility method for moving indices between ILM steps in tests |
build.gradle | Test cluster configuration to prevent shard rebalancing during force merges |
SearchableSnapshotActionTests.java | Unit tests validating clone step configuration and replica settings |
IndexLifecycleExplainResponse*.java | Response model updates to include force merge index tracking |
SegmentCountStep.java | Updated to operate on cloned index when available |
SearchableSnapshotAction.java | Core logic implementing conditional clone steps and cleanup |
MountSnapshotStep.java | Enhanced snapshot mounting logic for cloned indices |
GenerateSnapshotNameStep.java | Updated to use cloned index name for snapshot generation |
ForceMergeStep.java | Modified to target cloned index when present |
DeleteStep.java | Enhanced with configurable target index deletion capability |
CreateSnapshotStep.java | Updated to snapshot the force-merged clone instead of original |
ESRestTestCase.java | Test framework utility for waiting on index deletion |
LifecycleExecutionState.java | Core state tracking for force merge index names |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
...de/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java
Show resolved
Hide resolved
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/IndexLifecycleExplainResponse.java
Outdated
Show resolved
Hide resolved
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/IndexLifecycleExplainResponse.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Copilot <[email protected]>
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @nielsbauman, I've created a changelog YAML for you. |
ClusterHealthStatus.GREEN, | ||
FORCE_MERGE_INDEX_NAME_SUPPLIER | ||
), | ||
cleanupClonedIndexKey |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I thought I copied the approach of the ShrinkAction
here by going back to the cleanup step if the threshold/timeout is passed. But it looks like that's not the case:
elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ShrinkAction.java
Lines 267 to 270 in 1ff6608
ClusterStateWaitUntilThresholdStep checkShrinkReadyStep = new ClusterStateWaitUntilThresholdStep( | |
new CheckShrinkReadyStep(allocationRoutedKey, shrinkKey), | |
setSingleNodeKey | |
); |
The
ShrinkAction
just goes back to SetSingleNodeAllocateStep
. I'm inclined to think my current approach is safer, but I'm also a fan of consistency. Anyone else have any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one you linked is the "wait for single node allocation bit", where the new shrunken index hasn't been created yet (so there's nothing to clean up). You use the same behavior to go back to the cleanup later on in the file:
elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ShrinkAction.java
Lines 289 to 295 in 1ff6608
// wait until the shrunk index is recovered. we again wait until the configured threshold is breached and if the shrunk index has | |
// not successfully recovered until then, we rewind to the "cleanup-shrink-index" step to delete this unsuccessful shrunk index | |
// and retry the operation by generating a new shrink index name and attempting to shrink again | |
ClusterStateWaitUntilThresholdStep allocated = new ClusterStateWaitUntilThresholdStep( | |
new ShrunkShardsAllocatedStep(enoughShardsKey, copyMetadataKey), | |
cleanupShrinkIndexKey | |
); |
Which matches the behavior here, so I believe it is consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this Niels! I left some comments but they're mostly cosmetic.
docs/changelog/133954.yaml
Outdated
@@ -0,0 +1,6 @@ | |||
pr: 133954 | |||
summary: "ILM: Force merge on zero-replica cloned index before snapshot" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps mention that this is for the searchable snapshot step here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it to ILM: Force merge on zero-replica cloned index before snapshotting for searchable snapshots
. Let me know if that matches what you had in mind.
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/DeleteStep.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ForceMergeStep.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/cluster/metadata/LifecycleExecutionState.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/GenerateSnapshotNameStep.java
Outdated
Show resolved
Hide resolved
IndexMetadata indexMetadata = project.index(index); | ||
assert indexMetadata != null : "index " + index.getName() + " must exist in the cluster state"; | ||
String cloneIndexName = indexMetadata.getLifecycleExecutionState().forceMergeIndexName(); | ||
return cloneIndexName != null && project.index(cloneIndexName) != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens here if a user manually removes the cloned index after it was created so that project.index(cloneIndexName)
returns null? Wouldn't we erroneously assume we're on the no-clone path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a user manually removes the cloned index after it was created and we then run the DeleteStep
on the force-merged index, it would fail, right?. So, that assumption doesn't sound "erroneous" to me. Or am I misinterpreting your comment?
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SearchableSnapshotAction.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SegmentCountStep.java
Outdated
Show resolved
Hide resolved
...de/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java
Outdated
Show resolved
Hide resolved
...de/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly looks good to me, but I have one concern (EDIT: see the bottom).
I can't comment on the code lines (because it's out of the review), but in the SearchableSnapshotAction
steps where we copy over the execution state and the lifecycle name setting:
CopyExecutionStateStep copyMetadataStep = new CopyExecutionStateStep(
copyMetadataKey,
copyLifecyclePolicySettingKey,
(index, executionState) -> getRestoredIndexPrefix(copyMetadataKey) + index,
keyForReplicateForOrContinue
);
CopySettingsStep copySettingsStep = new CopySettingsStep(
copyLifecyclePolicySettingKey,
dataStreamCheckBranchingKey,
forceMergeIndex ? conditionalDeleteForceMergedIndexKey : dataStreamCheckBranchingKey,
(index, lifecycleState) -> getRestoredIndexPrefix(copyLifecyclePolicySettingKey) + index,
LifecycleSettings.LIFECYCLE_NAME
);
It appears that we're using getRestoredIndexPrefix(copyLifecyclePolicySettingKey) + index
for the name of the index into which we should copy the execution state settings. This would normally be fine, because it would be:
* my-backing-index
which is snapshotted
* mount as either partial-my-backing-index
or restored-my-backing-index
* copy the state to the partial
or restored
version depending on which prefix was used.
However, in this case, my-backing
…
RECORD SCRATCH.
It was at this point in my thinking and typing this comment out that I realized that we still control the mounted index name, so even if we snapshot fm-clone-my-backing-index
we still mount it as partial-my-backing-index
, NOT partial-fm-clone-my-backing-index
, which means that the concern above is not valid! However, I've decided to leave my initially-erroneous assessment in for posterity for anyone else that might come looking or have the same concern.
Feel free to ignore the above, as the PR looks good to me, thanks Niels. 😄
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/IndexLifecycleExplainResponse.java
Outdated
Show resolved
Hide resolved
...lugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/IndexLifecycleExplainResponse.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SegmentCountStep.java
Outdated
Show resolved
Hide resolved
// With multiple primary shards, the segments are more spread out, so it's even less likely that we'll get more than 1 segment | ||
// in one shard, and some shards might even be empty. | ||
assertThat(preLifecycleBackingIndexSegments, greaterThanOrEqualTo(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if there are multiple primary shards and some shards may be empty, shouldn't the total still be >= 1, since we've indexed at least one document? I'm not sure I understand how having one primary means we have >= 1 segment, but having more than one primary means that we may get 0 segments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I understand your confusion. I had a similar confusion the first time I looked at it, but forgot to mention that somewhere, sorry. The caveat here is that TimeSeriesRestDriver#getNumberOfPrimarySegments(Client, String)
returns the number of segments for the "first" primary shard, i.e. it just gets shard 0
:
Line 394 in 7af81af
List<Map<String, Object>> shards = (List<Map<String, Object>>) responseEntity.get("0"); |
That means that it will essentially return the number of segments of a random primary shard if there are multiple primary shards. There is only one other usage of that method, which runs a test with only one primary shard, so I think we can change the implementation of the method to return the sum of segments across all primary shards. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh okay, the name definitely doesn't make that clear. I'd be in favor of making it return the sum of segments across all the primaries as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5806e64.
I'm running |
As a follow-up of elastic#133954, this class could use a clean up in deduplicating code, replacing some `assertBusy`s with `awaitIndexExists`, and more.
As a follow-up of elastic#133954, this class could use a clean up in deduplicating code, replacing some `assertBusy`s with `awaitIndexExists`, and more.
…5834) In this PR we move the force-merge operation from the downsampling request to the ILM action. Our goal is to decouple the downsampling operation from the force-merge operation. With this change the downsampling request is responsible to ensure that the downsampled index is refreshed and flushed but not to force merge it. We believe that most of the time this is not necessary, and executing the force-merge operation unnecessarily can increase the load on the cluster. To preserve backwards compatibility we move the responsibility to execute the existing force merge to the downsample ILM action and we make it configurable. By default, it will run but a user can disable it just as they can with a searchable snapshot. ``` "downsample": { "fixed_interval": "1h", "force_merge_index": false } ``` **Update** As a follow up of this PR, we pose the question is the force merge in the downsample action intentional and useful? To answer this question, we extend time series telemetry. We define that the force merge step in the downsample ILM action is useful, if this is the only force merge step operation before a searchable snapshot. Effectively, by this definition, we argue that the force merge in downsampling is not an intentional operation the user has requested but only the result of the implementation. We identify the biggest impact of removing it to be a searchable snapshot, but if the searchable snapshot performs its own force merge (and more performant force merge #133954) then we could skip this operation in the downsample action altogether. Fixes: #135618
…stic#135834) In this PR we move the force-merge operation from the downsampling request to the ILM action. Our goal is to decouple the downsampling operation from the force-merge operation. With this change the downsampling request is responsible to ensure that the downsampled index is refreshed and flushed but not to force merge it. We believe that most of the time this is not necessary, and executing the force-merge operation unnecessarily can increase the load on the cluster. To preserve backwards compatibility we move the responsibility to execute the existing force merge to the downsample ILM action and we make it configurable. By default, it will run but a user can disable it just as they can with a searchable snapshot. ``` "downsample": { "fixed_interval": "1h", "force_merge_index": false } ``` **Update** As a follow up of this PR, we pose the question is the force merge in the downsample action intentional and useful? To answer this question, we extend time series telemetry. We define that the force merge step in the downsample ILM action is useful, if this is the only force merge step operation before a searchable snapshot. Effectively, by this definition, we argue that the force merge in downsampling is not an intentional operation the user has requested but only the result of the implementation. We identify the biggest impact of removing it to be a searchable snapshot, but if the searchable snapshot performs its own force merge (and more performant force merge elastic#133954) then we could skip this operation in the downsample action altogether. Fixes: elastic#135618
When performing a searchable snapshot action with force merge enabled, if the source index has one or more replicas, ILM now clones the index with zero replicas and performs the force merge on the clone. The snapshot is then taken from the force-merged clone instead of the source index, ensuring only primary shards are force-merged. The cloned index is deleted after the snapshot is mounted, and all references and step logic have been updated accordingly. Test coverage was added for the new flow, including handling retries and cleanup of failed clones.
Key changes:
Resolves #75478