Skip to content

Conversation

@phananh1010
Copy link
Owner

In elastic#133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards.

We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by elastic#137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team).

Therefore, we implement an opt-out flag that users can configure in the searchable_snapshot action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default.

BASE=fa252fa56c17b47c3b45ac81d4e498895361bdb5
HEAD=77a79ae04273283ac37958d5d2b491236b8d8c88
Branch=main
@phananh1010
Copy link
Owner Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 565 to 575
if (forceMergeIndex) {
steps.add(conditionalSkipCloneStep);
steps.add(readOnlyStep);
steps.add(cleanupClonedIndexStep);
steps.add(generateCloneIndexNameStep);
steps.add(cloneIndexStep);
steps.add(waitForClonedIndexGreenStep);
if (shouldForceMergeOnClone()) {
steps.add(conditionalSkipCloneStep);
steps.add(readOnlyStep);
steps.add(cleanupClonedIndexStep);
steps.add(generateCloneIndexNameStep);
steps.add(cloneIndexStep);
steps.add(waitForClonedIndexGreenStep);
}
steps.add(forceMergeStep);
steps.add(segmentCountStep);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Clear cloned index state when clone is skipped

When force_merge_on_clone is set to false, the clone-related steps are skipped but the lifecycle state’s forceMergeCloneIndexName is left untouched. ForceMergeStep, SegmentCountStep, and GenerateSnapshotNameStep still rely on FORCE_MERGE_CLONE_INDEX_NAME_FALLBACK_SUPPLIER, which prefers the stored clone name. If this field was populated by a prior run (the very scenario users are likely in when opting out), the action will continue to force-merge and snapshot the stale clone or fail with IndexNotFoundException after the clone is deleted, defeating the opt-out. The code should clear forceMergeCloneIndexName or ignore it whenever cloning is disabled so that the original index is used.

Useful? React with 👍 / 👎.

@phananh1010 phananh1010 closed this Nov 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants