Skip to content

Reduce ClusterState retention in retry closures#20858

Open
HarishNarasimhanK wants to merge 15 commits intoopensearch-project:mainfrom
HarishNarasimhanK:main
Open

Reduce ClusterState retention in retry closures#20858
HarishNarasimhanK wants to merge 15 commits intoopensearch-project:mainfrom
HarishNarasimhanK:main

Conversation

@HarishNarasimhanK
Copy link
Copy Markdown

@HarishNarasimhanK HarishNarasimhanK commented Mar 13, 2026

Description

1. Goal

In OpenSearch, snapshot deletion is a cluster manager routed operation. When a delete request is received, the cluster manager creates internal callback objects (listeners) to track the operation and notify the caller once it completes. These listeners inadvertently hold a reference to a large in-memory object called ClusterState, which contains the entire cluster's metadata, routing information, and index definitions.

When a snapshot deletion gets stuck or takes a long time to complete, users or automated systems may retry the delete request multiple times. As listeners accumulate from repeated retries, multiple ClusterState objects get pinned on the heap, causing the cluster manager node's memory usage to grow until it runs out of memory.

This change fixes the issue by ensuring that the listeners only hold the small pieces of information they actually need (a version number and a node identifier) instead of the entire ClusterState object. This allows the large ClusterState objects to be garbage collected immediately, preventing the memory buildup.

2. Current Workflow

This section traces the lifecycle of a snapshot delete request from the REST API to the point where the listener is stored in SnapshotsService.

  1. A client sends DELETE /_snapshot/{repository}/{snapshot}.

  2. The REST layer (RestDeleteSnapshotAction) constructs a DeleteSnapshotRequest and passes it to NodeClient.

  3. NodeClient dispatches the request to TransportDeleteSnapshotAction, which extends TransportClusterManagerNodeAction.

  4. The base class creates an AsyncSingleAction instance to manage the request lifecycle. AsyncSingleAction fetches the current ClusterState and calls doStart(clusterState).

  5. If the local node is the cluster manager, doStart() wraps the original listener using getDelegateForLocalExecute(clusterState). This wrapper contains a lambda for retry logic that references the clusterState parameter. Due to Java lambda capture semantics, the entire ClusterState object is implicitly retained by this lambda for as long as the listener exists.

  6. The wrapped listener is passed into TransportDeleteSnapshotAction.clusterManagerOperation(), which calls snapshotsService.deleteSnapshots(request, listener). The listener still carries the captured ClusterState reference inside its retry lambda.

  7. Inside SnapshotsService, the deletion is submitted as a cluster state update. Once the cluster state is updated to record the deletion, the listener is stored in the snapshotDeletionListeners map (keyed by the deletion UUID) in order to notify the client when the deletion completes.

3. Issue with Current Workflow

  • The listener stored in snapshotDeletionListeners sits in the map until the repository-level deletion reaches a terminal state. If the deletion is stuck (due to slow I/O, stuck segment uploads, large repository cleanup, or any other reason), the listener remains in snapshotDeletionListeners indefinitely, and the captured ClusterState cannot be garbage collected.

  • For each subsequent delete request, SnapshotsService adds another listener to snapshotDeletionListeners through the same path. As delete requests accumulate, these listeners pile up, each pinning a ClusterState object on the heap. The cluster manager node's heap usage grows monotonically with each repeated delete, eventually leading to OutOfMemoryError.

4. Requirements

  • Reduce the size of the data retained by retry closures. Instead of capturing the full ClusterState object, closures should only hold the minimal primitives required for retry decisions.

  • Preserve existing retry behavior and backward compatibility.

5. Approach: Extract Primitives Before Closure Creation

The retry closures in TransportClusterManagerNodeAction only need two pieces of information from the ClusterState to make retry decisions: the cluster state version (a long) and the cluster manager node's ephemeral ID (a String). By extracting these values before creating any lambda or anonymous class, the closures capture only these lightweight primitives. The full ClusterState object is no longer referenced by any closure and becomes eligible for garbage collection immediately.

Sequence Diagram

image

Implementation Steps

  1. In getDelegateForLocalExecute inside AsyncSingleAction, the cluster state version and the cluster manager node are extracted before the lambda is created. The lambda now references only these extracted values, so the full ClusterState is no longer retained.

  2. A new overloaded retryOnMasterChange method is added that accepts the version and cluster manager node directly. It extracts the ephemeral ID from the cluster manager node and passes it to the predicate builder.

  3. The original retryOnMasterChange method that accepts a full ClusterState is kept as a convenience bridge. It extracts the version and cluster manager node and delegates to the new overload.

  4. The retry method signature is updated to accept the version and cluster manager node instead of the full ClusterState. It uses the persistent node ID and version to construct the cluster state observer via a new primitives-based constructor.

  5. A new overloaded build method is added to ClusterManagerNodeChangePredicate that accepts the version and ephemeral ID directly. The existing build method that accepts a full ClusterState is refactored to extract these values and delegate to the new overload.

  6. Two new constructors are added to ClusterStateObserver that accept the cluster manager node ID and version as primitives, instead of requiring a full ClusterState object.

  7. The StoredState inner class inside ClusterStateObserver is refactored to support construction from primitives. The existing constructor that accepts a ClusterState now delegates to the new primitives-based constructor.

6. Validation

The fix was validated by reproducing the memory retention issue on a local cluster and comparing heap dumps before and after the change.

Reproduction Setup

  1. Added a Thread.sleep() call in BlobStoreRepository.doDeleteShardSnapshots() to simulate a long-running deletion that stays stuck.

  2. Created 500 indices with heavy mappings (50+ fields each), multiple aliases per index, and a small number of documents per index to inflate the ClusterState size.

  3. Created one snapshot per index (500 snapshots total) using a filesystem-based snapshot repository.

  4. Spammed delete requests for all snapshots repeatedly, so that listeners accumulate in snapshotDeletionListeners while the deletion is stuck.

  5. Captured heap dumps from the cluster manager node and compared the retained size of listener chains.

Results

Before (without fix)

image

After (with fix)

image

Related Issues

Resolves #15065

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

PR Reviewer Guide 🔍

(Review updated until commit 2864a44)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add primitive overload to ClusterManagerNodeChangePredicate

Relevant files:

  • server/src/main/java/org/opensearch/cluster/ClusterManagerNodeChangePredicate.java
  • server/src/test/java/org/opensearch/cluster/ClusterManagerNodeChangePredicateTests.java

Sub-PR theme: Add primitive constructor to ClusterStateObserver

Relevant files:

  • server/src/main/java/org/opensearch/cluster/ClusterStateObserver.java
  • server/src/test/java/org/opensearch/cluster/ClusterStateObserverTests.java

Sub-PR theme: Reduce ClusterState retention in TransportClusterManagerNodeAction retry closures

Relevant files:

  • server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java
  • server/src/test/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeActionTests.java

⚡ Recommended focus areas for review

ID Mismatch

The new ClusterStateObserver constructor accepts a clusterManagerNodeId described as a "persistent node ID", but the existing StoredState class stores clusterManagerNodeId which is compared against newState.nodes().getClusterManagerNodeId() (persistent ID). However, in retryOnMasterChange, the ephemeral ID is used for ClusterManagerNodeChangePredicate.build(), while the persistent ID is passed to the ClusterStateObserver constructor. These two IDs serve different comparison purposes and should be carefully validated for consistency — the predicate uses ephemeral ID while the observer uses persistent ID, which could lead to subtle divergence in change detection behavior.

public ClusterStateObserver(
    String clusterManagerNodeId,
    long version,
    ClusterApplierService clusterApplierService,
    @Nullable TimeValue timeout,
    Logger logger,
    ThreadContext contextHolder
) {
    this.clusterApplierService = clusterApplierService;
    this.threadPool = clusterApplierService.threadPool();
    this.lastObservedState = new AtomicReference<>(new StoredState(clusterManagerNodeId, version));
    this.timeOutValue = timeout;
    if (timeOutValue != null) {
        this.startTimeMS = threadPool.relativeTimeInMillis();
    }
    this.logger = logger;
    this.contextHolder = contextHolder;
}
Null Safety

In checkForBlock, blockClusterManagerNode is extracted from localClusterState.nodes().getClusterManagerNode() and passed to retry(). Inside retry(), clusterManagerNode.getId() is called only after a null check, which is correct. However, the ClusterManagerNodeChangePredicate used in the block retry lambda is built inline (not via retryOnMasterChange), so it does not use the pre-extracted values — it still captures newState from the lambda parameter, which is fine. But the persistentNodeId derivation from a potentially null clusterManagerNode should be double-checked to ensure no NPE path exists.

final long blockStateVersion = localClusterState.version();
final DiscoveryNode blockClusterManagerNode = localClusterState.nodes().getClusterManagerNode();
retry(blockStateVersion, blockClusterManagerNode, blockException, newState -> {
Redundant Test

The test testPrimitiveConstructorViaClusterService creates a ClusterService mock and calls getClusterApplierService(), but then directly uses the ClusterApplierService-based constructor rather than any ClusterService-based constructor. The ClusterService mock is unused in the actual observer construction, making this test misleading and not actually testing what its Javadoc claims.

public void testPrimitiveConstructorViaClusterService() {
    final ClusterApplierService clusterApplierService = mock(ClusterApplierService.class);
    final ClusterService clusterService = mock(ClusterService.class);
    when(clusterService.getClusterApplierService()).thenReturn(clusterApplierService);
    final ThreadPool threadPool = mock(ThreadPool.class);
    when(clusterApplierService.threadPool()).thenReturn(threadPool);
    when(threadPool.relativeTimeInMillis()).thenReturn(0L);

    final DiscoveryNode masterNode = new DiscoveryNode("master", buildNewFakeTransportAddress(), Version.CURRENT);
    final ClusterState newerState = ClusterState.builder(new ClusterName("test"))
        .nodes(DiscoveryNodes.builder().add(masterNode).clusterManagerNodeId(masterNode.getId()))
        .version(10)
        .build();
    when(clusterApplierService.state()).thenReturn(newerState);

    // Use the ClusterApplierService-based constructor
    final ClusterStateObserver observer = new ClusterStateObserver(
        masterNode.getId(),
        1L,
        clusterApplierService,
        TimeValue.timeValueSeconds(30),
        logger,
        new ThreadContext(Settings.EMPTY)
    );

    final AtomicReference<ClusterState> receivedState = new AtomicReference<>();
    observer.waitForNextChange(new ClusterStateObserver.Listener() {
        @Override
        public void onNewClusterState(ClusterState state) {
            receivedState.set(state);
        }

        @Override
        public void onClusterServiceClose() {}

        @Override
        public void onTimeout(TimeValue timeout) {}
    });

    // Newer version detected — should accept immediately
    assertNotNull(receivedState.get());
    assertEquals(10L, receivedState.get().version());
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 13, 2026

PR Code Suggestions ✨

Latest suggestions up to 2864a44

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix node ID type mismatch in retry predicate

The ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId) is called
with the ephemeral ID, but the new ClusterStateObserver constructor stores the
persistent ID (clusterManagerNode.getId()) in StoredState. The
StoredState.isOlderOrDifferentMaster method compares clusterManagerNodeId against
clusterState.nodes().getClusterManagerNodeId() which returns the persistent ID. This
mismatch means the observer and predicate use different ID types, potentially
causing incorrect change detection. Both should use the same ID type consistently.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-295]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
-    final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
+    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, persistentNodeId));
 }
Suggestion importance[1-10]: 6

__

Why: This is a valid concern — ClusterManagerNodeChangePredicate.build uses ephemeral IDs while StoredState in ClusterStateObserver uses persistent IDs via getClusterManagerNodeId(). However, the original ClusterManagerNodeChangePredicate.build(ClusterState) also uses getEphemeralId(), so this may be intentional design. The inconsistency could cause subtle bugs with node restarts where ephemeral ID changes but persistent ID stays the same.

Low
Inconsistent node ID types between observer and predicate

The new ClusterStateObserver constructor accepts a clusterManagerNodeId that is the
persistent node ID (from DiscoveryNode.getId()), but StoredState compares it
against clusterState.nodes().getClusterManagerNodeId() which also returns the
persistent ID. However, in retryOnMasterChange, the
ClusterManagerNodeChangePredicate is built using the ephemeral ID
(getEphemeralId()), while StoredState uses the persistent ID. These two comparisons
are inconsistent — the predicate uses ephemeral IDs but the observer uses persistent
IDs. Ensure both use the same type of node identifier to avoid subtle bugs where a
restarted node (same persistent ID, different ephemeral ID) is not detected as a
change.

server/src/main/java/org/opensearch/cluster/ClusterStateObserver.java [130]

+// Either consistently use ephemeral IDs in both StoredState and ClusterManagerNodeChangePredicate,
+// or consistently use persistent IDs in both. For example, if using persistent IDs:
+// In retryOnMasterChange, pass clusterManagerNode.getId() instead of getEphemeralId() to ClusterManagerNodeChangePredicate.build()
 this.lastObservedState = new AtomicReference<>(new StoredState(clusterManagerNodeId, version));
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about ID type consistency, but the StoredState uses persistent IDs while ClusterManagerNodeChangePredicate uses ephemeral IDs — these serve different purposes and are intentionally separate. The improved_code is essentially a comment, not an actual fix, making it less actionable.

Low

Previous suggestions

Suggestions up to commit 2864a44
CategorySuggestion                                                                                                                                    Impact
General
Avoid retaining DiscoveryNode in retry closure

The retry method now accepts a DiscoveryNode clusterManagerNode parameter, but this
parameter is only used to extract persistentNodeId for the ClusterStateObserver
constructor. Passing the full DiscoveryNode object into the retry closure
reintroduces the very object retention the PR aims to avoid. Consider passing only
the pre-extracted persistentNodeId (String) instead of the full DiscoveryNode to
retry.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-295]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
     final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
+    retry(stateVersion, persistentNodeId, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid observation - the PR's goal is to avoid retaining full ClusterState objects, but passing DiscoveryNode into the retry closure still retains a node object. Extracting persistentNodeId as a String before calling retry would be more consistent with the PR's intent. The suggestion is accurate and the improved code correctly shows the change.

Medium
Replace DiscoveryNode parameter with String in retry

The retry method captures DiscoveryNode clusterManagerNode in its closure, which
defeats the purpose of this PR (reducing ClusterState retention). Since only
persistentNodeId (a String) is needed inside retry, the parameter type should be
changed to String persistentNodeId to avoid retaining the DiscoveryNode object in
the closure.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [297-314]

 private void retry(
     final long stateVersion,
-    final DiscoveryNode clusterManagerNode,
+    final String persistentNodeId,
     final Throwable failure,
     final Predicate<ClusterState> statePredicate
 ) {
     if (observer == null) {
         ...
-        final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
         this.observer = new ClusterStateObserver(
             persistentNodeId,
             stateVersion,
             clusterService.getClusterApplierService(),
Suggestion importance[1-10]: 7

__

Why: This suggestion correctly identifies that the retry method only needs persistentNodeId (a String) rather than the full DiscoveryNode, which would better achieve the PR's goal of reducing object retention in closures. The improved code accurately reflects the needed change to the method signature and body.

Medium
Extract only node ID string in block retry

Similar to the getDelegateForLocalExecute refactoring, the checkForBlock method
extracts a DiscoveryNode and passes it into the retry closure. If retry is updated
to accept a String persistentNodeId, this call site should also extract only the
persistent node ID string to avoid capturing the DiscoveryNode in the lambda.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [474-476]

 final long blockStateVersion = localClusterState.version();
-final DiscoveryNode blockClusterManagerNode = localClusterState.nodes().getClusterManagerNode();
-retry(blockStateVersion, blockClusterManagerNode, blockException, newState -> {
+final String blockPersistentNodeId = localClusterState.nodes().getClusterManagerNodeId();
+retry(blockStateVersion, blockPersistentNodeId, blockException, newState -> {
Suggestion importance[1-10]: 6

__

Why: This is a consistent follow-up to suggestion 2 - if retry is changed to accept a String instead of DiscoveryNode, this call site also needs updating. The suggestion is accurate and uses getClusterManagerNodeId() which directly returns the persistent node ID string without creating an intermediate DiscoveryNode reference.

Low
Suggestions up to commit 24c9147
CategorySuggestion                                                                                                                                    Impact
General
Avoid retaining DiscoveryNode in retry closure

The retry method now accepts a DiscoveryNode clusterManagerNode parameter, but this
parameter is only used to extract persistentNodeId for the ClusterStateObserver
constructor. Passing the full DiscoveryNode object into the retry closure partially
defeats the purpose of this PR (reducing ClusterState retention), since
DiscoveryNode itself can hold references. Consider passing only the two
primitive/string values needed (stateVersion and the persistent node ID string)
instead of the full DiscoveryNode.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-302]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
     final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
+    retry(stateVersion, persistentNodeId, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
 }
 
+private void retry(
+    final long stateVersion,
+    final String persistentNodeId,
+    final Throwable failure,
+    final Predicate<ClusterState> statePredicate
+) {
+    if (observer == null) {
+        ...
+        this.observer = new ClusterStateObserver(
+            persistentNodeId,
+            stateVersion,
+            clusterService.getClusterApplierService(),
+            TimeValue.timeValueMillis(remainingTimeoutMS),
+            logger,
+            threadPool.getThreadContext()
+        );
+    }
+    ...
+}
+
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid — passing the full DiscoveryNode into the retry method partially defeats the memory optimization goal of this PR. Extracting only the needed string persistentNodeId would be more consistent with the PR's intent to avoid retaining large objects in closures. However, DiscoveryNode is a relatively lightweight object compared to ClusterState, so the impact is moderate.

Low
Suggestions up to commit 6a9755d
CategorySuggestion                                                                                                                                    Impact
General
Avoid retaining DiscoveryNode reference in retry closure

The retry method receives clusterManagerNode (a DiscoveryNode) to extract the
persistent ID for ClusterStateObserver, but ClusterManagerNodeChangePredicate.build
uses the ephemeral ID. Passing the full DiscoveryNode into retry still retains a
reference to the node object in the closure. Since only the persistent ID is needed
inside retry, consider passing persistentNodeId (a String) directly instead of the
full DiscoveryNode to fully achieve the goal of reducing object retention.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-295]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
     final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
+    retry(stateVersion, persistentNodeId, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
 }
 
+private void retry(
+    final long stateVersion,
+    final String persistentNodeId,
+    final Throwable failure,
+    final Predicate<ClusterState> statePredicate
+) {
+    if (observer == null) {
+        ...
+        this.observer = new ClusterStateObserver(
+            persistentNodeId,
+            stateVersion,
+            clusterService.getClusterApplierService(),
+            TimeValue.timeValueMillis(remainingTimeoutMS),
+            logger,
+            threadPool.getThreadContext()
+        );
+    }
+    ...
+}
+
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid — passing a String persistentNodeId instead of the full DiscoveryNode to retry would better achieve the PR's stated goal of avoiding retention of large objects in closures. However, the improvement is minor since DiscoveryNode is a relatively small object compared to ClusterState, and the PR already extracts the key fields before creating closures.

Low
Suggestions up to commit b467299
CategorySuggestion                                                                                                                                    Impact
General
Avoid retaining DiscoveryNode object in retry closure

The retry method receives the full DiscoveryNode clusterManagerNode object only to
extract persistentNodeId for the ClusterStateObserver. Since the goal of this PR is
to avoid retaining the full ClusterState, passing the full DiscoveryNode into the
retry closure still retains a reference to a DiscoveryNode object. Consider passing
only the pre-extracted persistentNodeId (String) directly to retry instead of the
full DiscoveryNode.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-295]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
     final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    final String persistentNodeId = clusterManagerNode != null ? clusterManagerNode.getId() : null;
+    retry(stateVersion, persistentNodeId, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
 }
 
+private void retry(
+    final long stateVersion,
+    final String persistentNodeId,
+    final Throwable failure,
+    final Predicate<ClusterState> statePredicate
+) {
+    if (observer == null) {
+        ...
+        this.observer = new ClusterStateObserver(
+            persistentNodeId,
+            stateVersion,
+            clusterService,
+            TimeValue.timeValueMillis(remainingTimeoutMS),
+            logger,
+            threadPool.getThreadContext()
+        );
+    }
+    ...
+}
+
Suggestion importance[1-10]: 5

__

Why: This is a valid improvement aligned with the PR's goal of reducing object retention in closures. Passing only the persistentNodeId string instead of the full DiscoveryNode to retry avoids retaining the DiscoveryNode object, which is a minor but meaningful memory optimization consistent with the PR's intent.

Low
Verify consistent node ID type usage

The new ClusterStateObserver constructor accepts a clusterManagerNodeId that is the
persistent node ID (from DiscoveryNode.getId()), but StoredState compares it
against clusterState.nodes().getClusterManagerNodeId() which also returns the
persistent ID — this is consistent. However, in retryOnMasterChange, the
ephemeralNodeId is passed to ClusterManagerNodeChangePredicate.build while the
persistentNodeId is passed to ClusterStateObserver. These two IDs are different and
serve different comparison purposes, but it's worth verifying that
StoredState.isOlderOrDifferentMaster uses the same ID type as what is stored, to
avoid silent mismatches if the logic ever changes.

server/src/main/java/org/opensearch/cluster/ClusterStateObserver.java [141]

+// Ensure the stored ID matches what StoredState.isOlderOrDifferentMaster compares against
+// (persistent node ID from getClusterManagerNodeId())
 this.lastObservedState = new AtomicReference<>(new StoredState(clusterManagerNodeId, version));
Suggestion importance[1-10]: 2

__

Why: The suggestion only asks to verify/ensure correctness via a comment, and the existing_code and improved_code are functionally identical (only a comment is added). The concern about ID type consistency is valid but already correctly handled in the PR — persistentNodeId is used for StoredState and ephemeralNodeId for the predicate.

Low
Suggestions up to commit b467299
CategorySuggestion                                                                                                                                    Impact
Possible issue
Inconsistent node ID types passed to observer vs predicate

The retry method receives clusterManagerNode (a DiscoveryNode) and passes
clusterManagerNode.getId() (persistent ID) to ClusterStateObserver, while
ClusterManagerNodeChangePredicate.build is called with
clusterManagerNode.getEphemeralId(). The observer's StoredState uses persistent ID
for comparison, but the predicate uses ephemeral ID. These two checks are
inconsistent and may cause the observer to behave differently from the predicate
when detecting cluster manager changes.

server/src/main/java/org/opensearch/action/support/clustermanager/TransportClusterManagerNodeAction.java [292-295]

 private void retryOnMasterChange(long stateVersion, DiscoveryNode clusterManagerNode, Throwable failure) {
     final String ephemeralNodeId = clusterManagerNode != null ? clusterManagerNode.getEphemeralId() : null;
-    retry(stateVersion, clusterManagerNode, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
+    retry(stateVersion, ephemeralNodeId, failure, ClusterManagerNodeChangePredicate.build(stateVersion, ephemeralNodeId));
 }
Suggestion importance[1-10]: 7

__

Why: This correctly identifies a real inconsistency: retry passes clusterManagerNode.getId() (persistent ID) to ClusterStateObserver while ClusterManagerNodeChangePredicate.build uses ephemeralNodeId. The improved_code proposes changing the retry signature to use ephemeralNodeId instead of DiscoveryNode, which would make both mechanisms consistent. However, this would require changing the retry method signature significantly, and the improved_code doesn't fully reflect all the cascading changes needed.

Medium
Inconsistent node ID types between observer and predicate

The new ClusterStateObserver constructor accepts a clusterManagerNodeId that is the
persistent node ID (from DiscoveryNode.getId()), but StoredState compares it
against clusterState.nodes().getClusterManagerNodeId() which also returns the
persistent ID. However, ClusterManagerNodeChangePredicate.build uses the
ephemeral ID (getEphemeralId()). The two mechanisms use different ID types,
which could cause inconsistent change detection. Ensure both StoredState and
ClusterManagerNodeChangePredicate use the same type of node identifier.

server/src/main/java/org/opensearch/cluster/ClusterStateObserver.java [141]

+// Ensure consistent use of persistent node ID in StoredState and ephemeral ID in predicate,
+// or document clearly which ID type is expected here.
 this.lastObservedState = new AtomicReference<>(new StoredState(clusterManagerNodeId, version));
Suggestion importance[1-10]: 6

__

Why: This is a valid observation: StoredState uses persistent node ID (getId()) while ClusterManagerNodeChangePredicate uses ephemeral ID (getEphemeralId()). However, the 'improved_code' only adds a comment without actually fixing the inconsistency, making it a documentation-only suggestion rather than a real fix.

Low
General
Test should verify node ID type assumption explicitly

The test constructs the observer with masterNode.getId() (persistent ID) and version
1, while the current state has version 5 with the same master. The test asserts the
listener is NOT added (immediate acceptance). However,
StoredState.isOlderOrDifferentMaster compares clusterManagerNodeId against
clusterState.nodes().getClusterManagerNodeId() (persistent ID), so this should work.
But if the implementation is later changed to use ephemeral IDs, this test would
silently pass for the wrong reason. Consider adding an explicit assertion on the
node ID type being used to make the test's intent clear.

server/src/test/java/org/opensearch/cluster/ClusterStateObserverTests.java [122-129]

-// Construct with persistent node ID and version 1 — newerState has version 5, same master
+// Construct with persistent node ID and version 1 — newerState has version 5, same master (persistent ID match)
+assertEquals(masterNode.getId(), newerState.nodes().getClusterManagerNodeId()); // verify persistent ID is used
 final ClusterStateObserver observer = new ClusterStateObserver(
     masterNode.getId(),
     1L,
     clusterApplierService,
     TimeValue.timeValueSeconds(30),
     logger,
     new ThreadContext(Settings.EMPTY)
 );
Suggestion importance[1-10]: 2

__

Why: This suggestion adds a defensive assertion to clarify which ID type is used, but it's a minor test improvement that guards against hypothetical future changes. The added assertion doesn't test new behavior and the suggestion is more about documentation/clarity than correctness.

Low

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 64a4b05

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 64a4b05: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ee994e5

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for ee994e5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 1ceb433

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 1ceb433: SUCCESS

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 13, 2026

Codecov Report

❌ Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 73.17%. Comparing base (15fcc08) to head (e68ebce).

Files with missing lines Patch % Lines
...a/org/opensearch/cluster/ClusterStateObserver.java 93.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20858      +/-   ##
============================================
- Coverage     73.26%   73.17%   -0.09%     
+ Complexity    72743    72703      -40     
============================================
  Files          5862     5862              
  Lines        332558   332580      +22     
  Branches      48010    48013       +3     
============================================
- Hits         243643   243360     -283     
- Misses        69343    69714     +371     
+ Partials      19572    19506      -66     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link
Copy Markdown
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit b8784a1.

PathLineSeverityDescription
server/src/test/java/org/opensearch/cluster/ClusterStateObserverTests.java346lowTrivially-passing assertion: `assertFalse("should not need to add listener", false)` always passes regardless of `listenerAdded.get()`, silently removing the verification that no timeout listener was registered. Should be `assertFalse("should not need to add listener", listenerAdded.get())`. Likely a typo but weakens test coverage for the new primitive constructor path.

The table above displays the top 10 most important findings.

Total: 1 | Critical: 0 | High: 0 | Medium: 0 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit b8784a1

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 9ed0978

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 9ed0978: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit ebe17a4

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for ebe17a4: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 4195b48

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for e42dc18: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4957af3: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 87ed45d: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4be892a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 4be892a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 1e72dcf: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for d4b3f62: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 6a9755d

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 6a9755d: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 24c9147

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 24c9147: SUCCESS

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2864a44

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2864a44: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 2864a44

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 2864a44: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

Failed to generate code suggestions for PR

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for e68ebce: SUCCESS

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

[BUG] [Remote Store] [Snapshots] Heavy Heap Usage on Master Node due stuck snapshot deletions for Remote Store clusters

4 participants