Skip to content

Don't consider retetion leases hold by replicas for remote store back…#20874

Merged
shwetathareja merged 2 commits intoopensearch-project:mainfrom
gbbafna:rl-leak-fix
Mar 16, 2026
Merged

Don't consider retetion leases hold by replicas for remote store back…#20874
shwetathareja merged 2 commits intoopensearch-project:mainfrom
gbbafna:rl-leak-fix

Conversation

@gbbafna
Copy link
Contributor

@gbbafna gbbafna commented Mar 15, 2026

…ed indices

Description

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Issue details

  1. Primary was on 5d4 . Replica was fd2c
  2. Primary relocated to 5d4 to 7ed8f at 17:42:13,292
  3. Replica came up on 5d4 at 17:42:22 .
  4. Retention lease of 5d4 was never removed on the new primary and the seq no didn't increase it ever.
"seq_no":{"max_seq_no":15249281,"local_checkpoint":-1,"global_checkpoint":-1},
"retention_leases":{"primary_term":1,"version":344724,"leases":[{"id":"peer_recovery/nodeA","retaining_seq_no":0,"timestamp":1773411457159,"source":"peer recovery"},{"id":"peer_recovery/nodeB","retaining_seq_no":15250120,"timestamp":1773411457159,"source":"peer recovery"}]}

This seq_no=0 retention lease led to shard size growing huge , as deletes never got expunged .

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…ed indices

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

PR Reviewer Guide 🔍

(Review updated until commit 13b5af9)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Fix retention lease expiration for remote store backed indices

Relevant files:

  • server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java

Sub-PR theme: Add test for retention lease after primary relocation in remote store

Relevant files:

  • server/src/internalClusterTest/java/org/opensearch/remotestore/RemoteIndexPrimaryRelocationIT.java

⚡ Recommended focus areas for review

Incomplete Fix

For remote store enabled indices, only the primary shard's retention lease ID is kept. This means all replica peer-recovery retention leases are excluded from leaseIdsForCurrentPeers, which causes them to be expired/removed. While this is the intended fix for the described bug, it could be problematic if a replica is in the middle of peer recovery and needs its retention lease to remain valid. Consider whether replicas that are currently recovering should also be included.

final Set<String> leaseIdsForCurrentPeers;
if (indexSettings.isRemoteStoreEnabled()) {
    leaseIdsForCurrentPeers = Collections.singleton(getPeerRecoveryRetentionLeaseId(routingTable.primaryShard().currentNodeId()));
} else {
    leaseIdsForCurrentPeers = routingTable.assignedShards()
        .stream()
        .map(ReplicationTracker::getPeerRecoveryRetentionLeaseId)
        .collect(Collectors.toSet());
}
Missing assertBusy Timeout

The assertBusy call does not specify a custom timeout. The default timeout may be insufficient for the retention lease sync interval of 5 seconds plus propagation time. Consider specifying an explicit timeout (e.g., 30 seconds) to avoid flaky test failures.

assertBusy(() -> {
    final IndicesStatsResponse statsResponse = client().admin().indices().prepareStats("test").get();
    for (ShardStats shardStats : statsResponse.getShards()) {
        if (shardStats.getShardRouting().primary() == false) {
            continue;
        }
        final RetentionLeases retentionLeases = shardStats.getRetentionLeaseStats().retentionLeases();
        final List<RetentionLease> peerRecoveryLeases = retentionLeases.leases()
            .stream()
            .filter(l -> ReplicationTracker.PEER_RECOVERY_RETENTION_LEASE_SOURCE.equals(l.source()))
            .collect(Collectors.toList());
        assertEquals(
            "expected exactly one peer recovery retention lease but got " + peerRecoveryLeases,
            1,
            peerRecoveryLeases.size()
        );

        // the single lease should be for the current primary node
        assertEquals(
            ReplicationTracker.getPeerRecoveryRetentionLeaseId(shardStats.getShardRouting().currentNodeId()),
            peerRecoveryLeases.get(0).id()
        );
    }
});
Unused Variable

The variable dataNodeB is assigned but never used in the test. The target node is determined dynamically from the cluster state instead. This variable can be removed to avoid confusion.

final String dataNodeB = internalCluster().startDataOnlyNode();

@github-actions
Copy link
Contributor

github-actions bot commented Mar 15, 2026

PR Code Suggestions ✨

Latest suggestions up to 13b5af9
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null primary shard reference

routingTable.primaryShard() can return null if no primary shard is assigned (e.g.,
during recovery or relocation). Calling currentNodeId() on a null shard will throw a
NullPointerException. Add a null check before accessing the primary shard.

server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java [290]

-leaseIdsForCurrentPeers = Collections.singleton(getPeerRecoveryRetentionLeaseId(routingTable.primaryShard().currentNodeId()));
+ShardRouting primaryShard = routingTable.primaryShard();
+leaseIdsForCurrentPeers = (primaryShard != null && primaryShard.assignedToNode())
+    ? Collections.singleton(getPeerRecoveryRetentionLeaseId(primaryShard.currentNodeId()))
+    : Collections.emptySet();
Suggestion importance[1-10]: 6

__

Why: The suggestion raises a valid concern about potential NullPointerException if routingTable.primaryShard() returns null. However, this method is called in primaryMode (asserted on line 284), which typically guarantees a primary shard exists, making this a lower-probability edge case rather than a critical bug.

Low
General
Ensure relocation targets the correct node

The filter selects any data node that is not the source node, but dataNodeB was just
started and is the intended target. If there are more than two data nodes, the
target could be an unintended node. Use dataNodeB's name directly to resolve the
DiscoveryNode to ensure the move command targets the correct node.

server/src/internalClusterTest/java/org/opensearch/remotestore/RemoteIndexPrimaryRelocationIT.java [97-99]

-.filter(n -> n.getId().equals(sourceNodeId) == false)
+.filter(n -> n.getName().equals(dataNodeB))
 .findFirst()
 .orElseThrow();
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid - using dataNodeB directly would be more precise and deterministic. However, since only two data nodes exist at this point in the test (one started before and dataNodeB), the filter n.getId().equals(sourceNodeId) == false effectively selects dataNodeB anyway, making this a minor improvement rather than a critical fix. Note that the improved code uses getName() while dataNodeB is a node name, which should work correctly.

Low

Previous suggestions

Suggestions up to commit 51f4569
CategorySuggestion                                                                                                                                    Impact
General
Ensure correct target node is selected

The filter selects any data node that is not the source node, but dataNodeB was just
started and is the intended target. If there are more than two data nodes, the test
may pick an unintended node as the relocation target, making the test
non-deterministic. Filter explicitly for dataNodeB's name or ID to ensure the
correct target node is selected.

server/src/internalClusterTest/java/org/opensearch/remotestore/RemoteIndexPrimaryRelocationIT.java [97-99]

-.filter(n -> n.getId().equals(sourceNodeId) == false)
+.filter(n -> n.getName().equals(dataNodeB))
 .findFirst()
 .orElseThrow();
Suggestion importance[1-10]: 6

__

Why: The suggestion is valid - filtering by dataNodeB name instead of "not the source node" makes the test deterministic and ensures the intended target is selected. However, DiscoveryNode uses getId() not getName(), so the improved code using getName() may not work correctly with the DiscoveryNode API.

Low
Possible issue
Guard against null primary shard reference

routingTable.primaryShard() can return null if there is no primary shard assigned
(e.g., during certain cluster states). Calling currentNodeId() on a null shard would
cause a NullPointerException. Add a null check or use a safe accessor before calling
currentNodeId().

server/src/main/java/org/opensearch/index/seqno/ReplicationTracker.java [290]

-leaseIdsForCurrentPeers = Collections.singleton(getPeerRecoveryRetentionLeaseId(routingTable.primaryShard().currentNodeId()));
+ShardRouting primaryShard = routingTable.primaryShard();
+if (primaryShard == null || primaryShard.currentNodeId() == null) {
+    leaseIdsForCurrentPeers = Collections.emptySet();
+} else {
+    leaseIdsForCurrentPeers = Collections.singleton(getPeerRecoveryRetentionLeaseId(primaryShard.currentNodeId()));
+}
Suggestion importance[1-10]: 5

__

Why: While the null check is a valid defensive programming concern, in practice getRetentionLeases is only called when primaryMode is asserted (line 284), meaning a primary shard should always be assigned. The suggestion is valid but addresses an edge case that may not occur in practice given the existing assertion.

Low

@github-actions
Copy link
Contributor

❌ Gradle check result for 51f4569: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…l the time

Signed-off-by: Gaurav Bafna <gbbafna@amazon.com>
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 13b5af9

@github-actions
Copy link
Contributor

✅ Gradle check result for 13b5af9: SUCCESS

@shwetathareja shwetathareja merged commit 9bd4634 into opensearch-project:main Mar 16, 2026
56 of 57 checks passed
@codecov
Copy link

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.41%. Comparing base (4d6ccf1) to head (13b5af9).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
...org/opensearch/index/seqno/ReplicationTracker.java 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20874      +/-   ##
============================================
+ Coverage     73.30%   73.41%   +0.10%     
- Complexity    72280    72348      +68     
============================================
  Files          5796     5797       +1     
  Lines        330263   330304      +41     
  Branches      47663    47674      +11     
============================================
+ Hits         242102   242489     +387     
+ Misses        68754    68420     -334     
+ Partials      19407    19395      -12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants