Skip to content

Conversation

nielsbauman
Copy link
Contributor

When performing a searchable snapshot action with force merge enabled, if the source index has one or more replicas, ILM now clones the index with zero replicas and performs the force merge on the clone. The snapshot is then taken from the force-merged clone instead of the source index, ensuring only primary shards are force-merged. The cloned index is deleted after the snapshot is mounted, and all references and step logic have been updated accordingly. Test coverage was added for the new flow, including handling retries and cleanup of failed clones.

Key changes:

  • Execution state: Track the force-merged clone index in ILM state and propagate through relevant APIs.
  • SearchableSnapshotAction: Add conditional steps to clone the index with 0 replicas, force-merge, and delete the clone as needed.
  • Steps: Update ForceMerge, SegmentCount, Snapshot, and Delete steps to operate on the correct index (source or clone).
  • Tests/QA: Add and enhance tests to verify force-merge and snapshot behavior with and without replicas, including retry/cleanup paths and configuration for stable force-merges.

Resolves #75478

When performing a searchable snapshot action with force merge enabled,
if the source index has one or more replicas, ILM now clones the index
with zero replicas and performs the force merge on the clone. The
snapshot is then taken from the force-merged clone instead of the source
index, ensuring only primary shards are force-merged. The cloned index
is deleted after the snapshot is mounted, and all references and step
logic have been updated accordingly. Test coverage was added for the new
flow, including handling retries and cleanup of failed clones.

Key changes:
- Execution state: Track the force-merged clone index in ILM state and
propagate through relevant APIs.
- SearchableSnapshotAction: Add conditional steps to clone the index
with 0 replicas, force-merge, and delete the clone as needed.
- Steps: Update ForceMerge, SegmentCount, Snapshot, and Delete steps to
operate on the correct index (source or clone).
- Tests/QA: Add and enhance tests to verify force-merge and snapshot
behavior with and without replicas, including retry/cleanup paths and
configuration for stable force-merges.

Resolves elastic#75478
@nielsbauman nielsbauman requested a review from Copilot September 1, 2025 17:34
@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label labels Sep 1, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a new ILM behavior for searchable snapshot actions with force merge enabled. When the source index has one or more replicas, ILM now clones the index with zero replicas and performs the force merge on the clone instead of the original index. This optimization avoids unnecessarily force-merging replica shards since snapshots only capture primary shards.

  • Force merge optimization by cloning indices with replicas to avoid merging unnecessary replica shards
  • New execution state tracking for force-merged clone indices throughout the ILM lifecycle
  • Enhanced test coverage including failure scenarios and retry mechanisms

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
TransportExplainLifecycleAction.java Adds force merge index name to ILM explain response
SearchableSnapshotActionIT.java Comprehensive test coverage for new clone-based force merge behavior
TimeSeriesRestDriver.java Utility method for moving indices between ILM steps in tests
build.gradle Test cluster configuration to prevent shard rebalancing during force merges
SearchableSnapshotActionTests.java Unit tests validating clone step configuration and replica settings
IndexLifecycleExplainResponse*.java Response model updates to include force merge index tracking
SegmentCountStep.java Updated to operate on cloned index when available
SearchableSnapshotAction.java Core logic implementing conditional clone steps and cleanup
MountSnapshotStep.java Enhanced snapshot mounting logic for cloned indices
GenerateSnapshotNameStep.java Updated to use cloned index name for snapshot generation
ForceMergeStep.java Modified to target cloned index when present
DeleteStep.java Enhanced with configurable target index deletion capability
CreateSnapshotStep.java Updated to snapshot the force-merged clone instead of original
ESRestTestCase.java Test framework utility for waiting on index deletion
LifecycleExecutionState.java Core state tracking for force merge index names

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <[email protected]>
@nielsbauman nielsbauman added >enhancement :Data Management/ILM+SLM Index and Snapshot lifecycle management and removed needs:triage Requires assignment of a team area label labels Sep 1, 2025
@nielsbauman nielsbauman requested a review from dakrone September 1, 2025 17:38
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label Sep 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@elasticsearchmachine
Copy link
Collaborator

Hi @nielsbauman, I've created a changelog YAML for you.

ClusterHealthStatus.GREEN,
FORCE_MERGE_INDEX_NAME_SUPPLIER
),
cleanupClonedIndexKey
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I thought I copied the approach of the ShrinkAction here by going back to the cleanup step if the threshold/timeout is passed. But it looks like that's not the case:

ClusterStateWaitUntilThresholdStep checkShrinkReadyStep = new ClusterStateWaitUntilThresholdStep(
new CheckShrinkReadyStep(allocationRoutedKey, shrinkKey),
setSingleNodeKey
);

The ShrinkAction just goes back to SetSingleNodeAllocateStep. I'm inclined to think my current approach is safer, but I'm also a fan of consistency. Anyone else have any thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one you linked is the "wait for single node allocation bit", where the new shrunken index hasn't been created yet (so there's nothing to clean up). You use the same behavior to go back to the cleanup later on in the file:

// wait until the shrunk index is recovered. we again wait until the configured threshold is breached and if the shrunk index has
// not successfully recovered until then, we rewind to the "cleanup-shrink-index" step to delete this unsuccessful shrunk index
// and retry the operation by generating a new shrink index name and attempting to shrink again
ClusterStateWaitUntilThresholdStep allocated = new ClusterStateWaitUntilThresholdStep(
new ShrunkShardsAllocatedStep(enoughShardsKey, copyMetadataKey),
cleanupShrinkIndexKey
);

Which matches the behavior here, so I believe it is consistent.

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this Niels! I left some comments but they're mostly cosmetic.

@@ -0,0 +1,6 @@
pr: 133954
summary: "ILM: Force merge on zero-replica cloned index before snapshot"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps mention that this is for the searchable snapshot step here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to ILM: Force merge on zero-replica cloned index before snapshotting for searchable snapshots. Let me know if that matches what you had in mind.

IndexMetadata indexMetadata = project.index(index);
assert indexMetadata != null : "index " + index.getName() + " must exist in the cluster state";
String cloneIndexName = indexMetadata.getLifecycleExecutionState().forceMergeIndexName();
return cloneIndexName != null && project.index(cloneIndexName) != null;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens here if a user manually removes the cloned index after it was created so that project.index(cloneIndexName) returns null? Wouldn't we erroneously assume we're on the no-clone path?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user manually removes the cloned index after it was created and we then run the DeleteStep on the force-merged index, it would fail, right?. So, that assumption doesn't sound "erroneous" to me. Or am I misinterpreting your comment?

@nielsbauman nielsbauman requested a review from dakrone September 4, 2025 12:22
Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly looks good to me, but I have one concern (EDIT: see the bottom).

I can't comment on the code lines (because it's out of the review), but in the SearchableSnapshotAction steps where we copy over the execution state and the lifecycle name setting:

        CopyExecutionStateStep copyMetadataStep = new CopyExecutionStateStep(
            copyMetadataKey,
            copyLifecyclePolicySettingKey,
            (index, executionState) -> getRestoredIndexPrefix(copyMetadataKey) + index,
            keyForReplicateForOrContinue
        );
        CopySettingsStep copySettingsStep = new CopySettingsStep(
            copyLifecyclePolicySettingKey,
            dataStreamCheckBranchingKey,
            forceMergeIndex ? conditionalDeleteForceMergedIndexKey : dataStreamCheckBranchingKey,
            (index, lifecycleState) -> getRestoredIndexPrefix(copyLifecyclePolicySettingKey) + index,
            LifecycleSettings.LIFECYCLE_NAME
        );

It appears that we're using getRestoredIndexPrefix(copyLifecyclePolicySettingKey) + index for the name of the index into which we should copy the execution state settings. This would normally be fine, because it would be:

* my-backing-index which is snapshotted
* mount as either partial-my-backing-index or restored-my-backing-index
* copy the state to the partial or restored version depending on which prefix was used.

However, in this case, my-backing

RECORD SCRATCH.

It was at this point in my thinking and typing this comment out that I realized that we still control the mounted index name, so even if we snapshot fm-clone-my-backing-index we still mount it as partial-my-backing-index, NOT partial-fm-clone-my-backing-index, which means that the concern above is not valid! However, I've decided to leave my initially-erroneous assessment in for posterity for anyone else that might come looking or have the same concern.

Feel free to ignore the above, as the PR looks good to me, thanks Niels. 😄

Comment on lines +1180 to +1182
// With multiple primary shards, the segments are more spread out, so it's even less likely that we'll get more than 1 segment
// in one shard, and some shards might even be empty.
assertThat(preLifecycleBackingIndexSegments, greaterThanOrEqualTo(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if there are multiple primary shards and some shards may be empty, shouldn't the total still be >= 1, since we've indexed at least one document? I'm not sure I understand how having one primary means we have >= 1 segment, but having more than one primary means that we may get 0 segments.

Copy link
Contributor Author

@nielsbauman nielsbauman Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I understand your confusion. I had a similar confusion the first time I looked at it, but forgot to mention that somewhere, sorry. The caveat here is that TimeSeriesRestDriver#getNumberOfPrimarySegments(Client, String) returns the number of segments for the "first" primary shard, i.e. it just gets shard 0:

List<Map<String, Object>> shards = (List<Map<String, Object>>) responseEntity.get("0");

That means that it will essentially return the number of segments of a random primary shard if there are multiple primary shards. There is only one other usage of that method, which runs a test with only one primary shard, so I think we can change the implementation of the method to return the sum of segments across all primary shards. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh okay, the name definitely doesn't make that clear. I'd be in favor of making it return the sum of segments across all the primaries as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 5806e64.

@nielsbauman
Copy link
Contributor Author

I'm running SearchableSnapshotActionIT on repeat on a VM for the night and will go ahead with merging this PR tomorrow, to gain some confidence that the tests are consistent.

@nielsbauman nielsbauman enabled auto-merge (squash) September 5, 2025 13:05
@nielsbauman nielsbauman merged commit 85f7fa9 into elastic:main Sep 5, 2025
33 checks passed
@nielsbauman nielsbauman deleted the searchable-snapshot-clone branch September 5, 2025 14:13
nielsbauman added a commit to nielsbauman/elasticsearch that referenced this pull request Sep 8, 2025
As a follow-up of elastic#133954, this class could use a clean up in
deduplicating code, replacing some `assertBusy`s with
`awaitIndexExists`, and more.
nielsbauman added a commit that referenced this pull request Sep 8, 2025
As a follow-up of #133954, this class could use a clean up in
deduplicating code, replacing some `assertBusy`s with
`awaitIndexExists`, and more.
rjernst pushed a commit to rjernst/elasticsearch that referenced this pull request Sep 9, 2025
As a follow-up of elastic#133954, this class could use a clean up in
deduplicating code, replacing some `assertBusy`s with
`awaitIndexExists`, and more.
elasticsearchmachine pushed a commit that referenced this pull request Oct 14, 2025
…5834)

In this PR we move the force-merge operation from the downsampling
request to the ILM action. 

Our goal is to decouple the downsampling operation from the force-merge
operation. With this change the downsampling request is responsible to
ensure that the downsampled index is refreshed and flushed but not to
force merge it.

We believe that most of the time this is not necessary, and executing
the force-merge operation unnecessarily can increase the load on the
cluster.

To preserve backwards compatibility we move the responsibility to
execute the existing force merge to the downsample ILM action and we
make it configurable. By default, it will run but a user can disable it
just as they can with a searchable snapshot.

```
"downsample": {
  "fixed_interval": "1h",
  "force_merge_index": false
}
```

**Update**

As a follow up of this PR, we pose the question is the force merge in
the downsample action intentional and useful? 

To answer this question, we extend time series telemetry. We define that
the force merge step in the downsample ILM action is useful, if this is
the only force merge step operation before a searchable snapshot.

Effectively, by this definition, we argue that the force merge in
downsampling is not an intentional operation the user has requested but
only the result of the implementation. We identify the biggest impact of
removing it to be a searchable snapshot, but if the searchable snapshot
performs its own force merge (and more performant force merge #133954)
then we could skip this operation in the downsample action altogether.

Fixes: #135618
Kubik42 pushed a commit to Kubik42/elasticsearch that referenced this pull request Oct 16, 2025
…stic#135834)

In this PR we move the force-merge operation from the downsampling
request to the ILM action. 

Our goal is to decouple the downsampling operation from the force-merge
operation. With this change the downsampling request is responsible to
ensure that the downsampled index is refreshed and flushed but not to
force merge it.

We believe that most of the time this is not necessary, and executing
the force-merge operation unnecessarily can increase the load on the
cluster.

To preserve backwards compatibility we move the responsibility to
execute the existing force merge to the downsample ILM action and we
make it configurable. By default, it will run but a user can disable it
just as they can with a searchable snapshot.

```
"downsample": {
  "fixed_interval": "1h",
  "force_merge_index": false
}
```

**Update**

As a follow up of this PR, we pose the question is the force merge in
the downsample action intentional and useful? 

To answer this question, we extend time series telemetry. We define that
the force merge step in the downsample ILM action is useful, if this is
the only force merge step operation before a searchable snapshot.

Effectively, by this definition, we argue that the force merge in
downsampling is not an intentional operation the user has requested but
only the result of the implementation. We identify the biggest impact of
removing it to be a searchable snapshot, but if the searchable snapshot
performs its own force merge (and more performant force merge elastic#133954)
then we could skip this operation in the downsample action altogether.

Fixes: elastic#135618
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Data Management/ILM+SLM Index and Snapshot lifecycle management >enhancement Team:Data Management Meta label for data/management team v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we avoid force-merging all shard copies?

3 participants