Skip to content

Conversation

@phananh1010
Copy link
Owner

In elastic#133954, we modified ILM's searchable snapshot action to perform the force-merge on a clone of the index with 0 replicas. This optimization avoids performing the force-merge redundantly on replicas, as the subsequent snapshot operation only looks at primary shards.

We've seen some cases where cloning the index resulted in issues; there was a bug in the clone API that caused shards to be initializing permanently under specific circumstances (fixed by elastic#137096), and cloned shards are unable to be assigned if their source lives on a node that is close/past the low watermark disk threshold (will be fixed soon by the Distributed Coordination team).

Therefore, we implement an opt-out flag that users can configure in the searchable_snapshot action of their ILM policy if they don't want to clone the index with 0 replicas before performing the force-merge. We implement an opt-out instead of an opt-in, as we believe these issues to be rather specific (and soon resolved), and the clone is worth doing by default.

BASE=fa252fa56c17b47c3b45ac81d4e498895361bdb5
HEAD=77a79ae04273283ac37958d5d2b491236b8d8c88
Branch=main
Copilot AI review requested due to automatic review settings November 7, 2025 17:17
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

ForceMergeStep forceMergeStep = new ForceMergeStep(
forceMergeStepKey,
waitForSegmentCountKey,
client,
1,
FORCE_MERGE_CLONE_INDEX_NAME_FALLBACK_SUPPLIER
);
SegmentCountStep segmentCountStep = new SegmentCountStep(
waitForSegmentCountKey,
generateSnapshotNameKey,
client,
1,
FORCE_MERGE_CLONE_INDEX_NAME_FALLBACK_SUPPLIER
);
GenerateSnapshotNameStep generateSnapshotNameStep = new GenerateSnapshotNameStep(
generateSnapshotNameKey,
cleanSnapshotKey,
snapshotRepository,
FORCE_MERGE_CLONE_INDEX_NAME_FALLBACK_SUPPLIER

P1 Badge Opt‑out still targets stale clone index

When force_merge_on_clone is set to false we skip all clone‑related steps, but the subsequent ForceMergeStep, SegmentCountStep and snapshot generation still resolve their index name via FORCE_MERGE_CLONE_INDEX_NAME_FALLBACK_SUPPLIER. That supplier always prefers lifecycleState.forceMergeCloneIndexName() if it is non‑null. If the action had previously been executed (or partially executed) with cloning enabled, this field remains populated, so opting out will still run the force merge and snapshot steps against the old clone rather than the source index. In clusters where the clone is unassigned or already deleted – exactly the scenario the opt‑out is supposed to mitigate – the action will keep failing. The code should clear the stored clone name or ignore it when forceMergeOnClone is false so that the force merge targets the source index.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a new optional force_merge_on_clone parameter to the ILM searchable snapshot action, allowing users to opt out of performing the force-merge on a cloned index with 0 replicas and instead perform it directly on the source index.

  • Introduces the force_merge_on_clone boolean parameter (defaults to true to maintain backward compatibility)
  • Adds validation to ensure force_merge_on_clone can only be set when force_merge_index is true
  • Implements the transport version handling for cross-version compatibility

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/SearchableSnapshotAction.java Adds the new forceMergeOnClone field with validation, serialization, and logic to conditionally skip clone steps based on the parameter
x-pack/plugin/ilm/src/main/java/org/elasticsearch/xpack/ilm/action/RestPutLifecycleAction.java Adds the searchable_snapshot_force_merge_on_clone capability to the supported capabilities
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/SearchableSnapshotActionTests.java Updates test logic to validate the new parameter, including mutation and step generation tests
x-pack/plugin/ilm/qa/multi-node/src/javaRestTest/java/org/elasticsearch/xpack/ilm/actions/SearchableSnapshotActionIT.java Adds integration test for the opt-out behavior and updates constructor calls with the new parameter
x-pack/plugin/ilm/qa/rest/src/yamlRestTest/resources/rest-api-spec/test/ilm/90_searchable_snapshot.yml Adds YAML REST test to verify the new parameter can be set and retrieved correctly
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/TimeseriesLifecycleTypeTests.java Updates test constructor call to include the new parameter
x-pack/plugin/core/src/test/java/org/elasticsearch/xpack/core/ilm/LifecyclePolicyTests.java Updates random policy generation to include the new parameter
server/src/main/resources/transport/upper_bounds/9.3.csv Updates transport version upper bound for version 9.3
server/src/main/resources/transport/upper_bounds/9.2.csv Updates transport version upper bound for version 9.2
server/src/main/resources/transport/definitions/referable/ilm_searchable_snapshot_opt_out_clone.csv Defines the transport version mappings for the new feature
docs/reference/elasticsearch/index-lifecycle-actions/ilm-searchable-snapshot.md Documents the new force_merge_on_clone parameter and updates total_shards_per_node description
docs/changelog/137375.yaml Adds changelog entry for the enhancement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


This force merging occurs in the phase that the index is in **prior** to the `searchable_snapshot` action. For example, if using a `searchable_snapshot` action in the `hot` phase, the force merge will be performed on the hot nodes. If using a `searchable_snapshot` action in the `cold` phase, the force merge will be performed on whatever tier the index is **prior** to the `cold` phase (either `hot` or `warm`).

`force_merge_on_clone` {applies_to}`stack: ga 9.2.1`
Copy link

Copilot AI Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation syntax has extra backticks. Based on the pattern in other files (e.g., ilm-downsample.md line 22), the correct format should be:

`force_merge_on_clone` {applies_to}`stack: ga 9.2.1`

Instead of:

``force_merge_on_clone` {applies_to}`stack: ga 9.2.1``

Remove the extra backtick at the start and end.

Copilot uses AI. Check for mistakes.
@phananh1010 phananh1010 closed this Nov 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants