Skip to content

Fix flaky ShardsLimitAllocationDeciderIT#20856

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:shards-tight-capacity-test
Mar 13, 2026
Merged

Fix flaky ShardsLimitAllocationDeciderIT#20856
andrross merged 1 commit intoopensearch-project:mainfrom
andrross:shards-tight-capacity-test

Conversation

@andrross
Copy link
Member

RemoteShardsBalancer.balance() only rebalances by primary shard count, not total shard count. In tight-capacity scenarios where cluster-wide shard limits leave minimal spare slots, this prevents the balancer from redistributing replicas to free space on other nodes, leaving assignable shards unassigned. I believe this is the cause of the flakiness in ShardsLimitAllocationDeciderIT.

This change introduces a unit test that deliberately targets the tightly packed scenario. The RemoteShardsBalancer variant of the new test will reliably fail if run repeated, though there is still some non-determinism. Regardless, it is muted along with the exiting integration test. If/when we improve the intelligence of RemoteShardsBalancer then we can unmute these tests. For now the tests exist to document this known limitation.

Related Issues

Resolves #19726

Check List

  • Functionality includes testing.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@andrross andrross requested a review from a team as a code owner March 12, 2026 20:22
@github-actions github-actions bot added >test-failure Test failure from CI, local build, etc. autocut flaky-test Random test failure that succeeds on second run labels Mar 12, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Reviewer Guide 🔍

(Review updated until commit 6e0a3dc)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Mute flaky integration test for remote/warm scenario

Relevant files:

  • server/src/internalClusterTest/java/org/opensearch/cluster/routing/allocation/decider/ShardsLimitAllocationDeciderIT.java

Sub-PR theme: Add unit tests documenting tight-capacity balancer limitation

Relevant files:

  • server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java

⚡ Recommended focus areas for review

Convergence Limit

The allocateToConvergence method caps iterations at 10. If the allocation scenario requires more than 10 rounds to converge, the assertion assertTrue("Expected no shards to be INITIALIZING", ...) will fail with a misleading message rather than indicating a convergence timeout. Consider adding a more descriptive failure message or increasing the iteration cap if needed.

private ClusterState allocateToConvergence(AllocationService service, ClusterState clusterState) {
    clusterState = service.reroute(clusterState, "reroute");
    int iterations = 0;
    while (clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false && iterations < 10) {
        clusterState = startInitializingShardsAndReroute(service, clusterState);
        iterations++;
    }
    assertTrue("Expected no shards to be INITIALIZING", clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty());
    return clusterState;
}
Remote Index Type

Using IndexModule.Type.REMOTE_SNAPSHOT as the store type for remote nodes may not accurately represent a typical remote-capable (warm) index scenario. REMOTE_SNAPSHOT is typically for searchable snapshots, not general remote/warm indices. This could cause the test to not correctly exercise the RemoteShardsBalancer path as intended.

if (remote) {
    sb.put(IndexModule.INDEX_STORE_TYPE_SETTING.getKey(), IndexModule.Type.REMOTE_SNAPSHOT.getSettingsKey());
}

@andrross
Copy link
Member Author

FYI @Gagan6164 @gbbafna

I believe the flakiness here was introduced by #19532 due to the fact that RemoteShardsBalancer is not as sophisticated as LocalShardsBalancer. See my findings here: #19726 (comment)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 12, 2026

PR Code Suggestions ✨

Latest suggestions up to 6e0a3dc

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Improve convergence failure message clarity

If the loop exits due to hitting the iteration limit (10) rather than convergence,
the assertTrue will fail with a misleading message. Consider adding a guard or a
more descriptive failure message that indicates the iteration limit was reached, so
it's clear the balancer failed to converge rather than an unexpected state occurred.

server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java [122-126]

 while (clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false && iterations < 10) {
     clusterState = startInitializingShardsAndReroute(service, clusterState);
     iterations++;
 }
-assertTrue("Expected no shards to be INITIALIZING", clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty());
+assertTrue(
+    "Expected no shards to be INITIALIZING after " + iterations + " iterations (max 10); balancer may not have converged",
+    clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty()
+);
Suggestion importance[1-10]: 4

__

Why: The suggestion improves the assertion failure message to include iteration count, making it clearer when the balancer fails to converge vs. an unexpected state. This is a minor readability/debuggability improvement with limited functional impact.

Low
Avoid mutable builder reuse risk in index settings

The indexSettings(boolean remote) method returns a Settings.Builder, and calling
.put(indexLimitKey, 1) on it mutates the builder in place. Since indexSettings
creates a new builder each time, this is safe here, but the method signature
returning Settings.Builder instead of Settings is unconventional and could lead to
accidental mutation if reused. Consider returning Settings from indexSettings and
using Settings.builder().put(...).put(indexLimitKey, 1).build() at the call site for
clarity.

server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java [80]

-mb.put(IndexMetadata.builder("test1").settings(indexSettings(remote).put(indexLimitKey, 1)).numberOfShards(3).numberOfReplicas(1));
+mb.put(IndexMetadata.builder("test1")
+    .settings(Settings.builder().put(indexSettings(remote)).put(indexLimitKey, 1))
+    .numberOfShards(3).numberOfReplicas(1));
Suggestion importance[1-10]: 2

__

Why: The concern about mutable builder reuse is largely theoretical since indexSettings creates a new builder each call. The improved_code also changes the method signature indirectly without showing the full refactor, making this suggestion incomplete and of marginal value.

Low

Previous suggestions

Suggestions up to commit 4e26181
CategorySuggestion                                                                                                                                    Impact
Possible issue
Assert convergence after iteration limit

The iteration limit of 10 may be too low for complex allocation scenarios and could
cause the test to return a partially-converged state without any indication. If the
loop exits due to hitting the limit rather than full convergence, the test may
produce misleading results. Consider adding an assertion or a warning after the loop
to detect this case.

server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java [122-125]

 while (clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false && iterations < 10) {
     clusterState = startInitializingShardsAndReroute(service, clusterState);
     iterations++;
 }
+assertTrue("Allocation did not converge within iteration limit", clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty());
Suggestion importance[1-10]: 5

__

Why: Adding an assertion after the loop is a valid improvement to detect non-convergence, but the testTightCapacityConvergenceWithRemoteShards test is explicitly marked @AwaitsFix and expected to not converge, so adding this assertion would break that test. The suggestion is only applicable to the local shards test path.

Low
Suggestions up to commit e813254
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against incomplete allocation convergence

The iteration limit of 10 may be too low for complex allocation scenarios, causing
the test to return a partially converged state without any indication that
convergence was not reached. Consider adding an assertion or a warning after the
loop to detect when the limit is hit, or increase the limit to ensure full
convergence.

server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java [129-132]

-while (clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false && iterations < 10) {
+while (clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false && iterations < 100) {
     clusterState = startInitializingShardsAndReroute(service, clusterState);
     iterations++;
 }
+assertFalse("Allocation did not converge within iteration limit",
+    clusterState.getRoutingNodes().shardsWithState(INITIALIZING).isEmpty() == false);
Suggestion importance[1-10]: 5

__

Why: Adding a convergence check after the loop is a valid improvement to detect when the iteration limit is hit, but the current limit of 10 is likely sufficient for the described scenario. The suggestion to increase to 100 and add an assertion improves test robustness, though the impact is moderate.

Low
Return built Settings instead of builder

The method returns a Settings.Builder instead of a built Settings object, which
means callers must chain .build() or additional .put() calls. However, when used in
IndexMetadata.builder(...).settings(...), if the method signature expects a Settings
object, this could cause a compile error or unintended behavior. Verify that the
settings() method on IndexMetadata.Builder accepts a Settings.Builder, otherwise
return sb.build().

server/src/test/java/org/opensearch/cluster/routing/allocation/ShardsBalancerTightCapacityTests.java [118-124]

-private Settings.Builder indexSettings(boolean remote) {
+private Settings indexSettings(boolean remote) {
     final Settings.Builder sb = settings(Version.CURRENT);
     if (remote) {
         sb.put(IndexModule.INDEX_STORE_TYPE_SETTING.getKey(), IndexModule.Type.REMOTE_SNAPSHOT.getSettingsKey());
     }
-    return sb;
+    return sb.build();
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about returning Settings.Builder vs Settings, but looking at the usage in lines 81-87, the callers chain .put(indexLimitKey, 1) on the returned builder before passing to .settings(), so returning a builder is intentional and necessary. The suggestion would break the existing chaining pattern.

Low

@github-actions
Copy link
Contributor

❌ Gradle check result for e813254: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross andrross force-pushed the shards-tight-capacity-test branch from e813254 to 4e26181 Compare March 12, 2026 21:21
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 4e26181

RemoteShardsBalancer.balance() only rebalances by primary shard
count, not total shard count. In tight-capacity scenarios where
cluster-wide shard limits leave minimal spare slots, this prevents
the balancer from redistributing replicas to free space on other
nodes, leaving assignable shards unassigned. I believe this is the
cause of the flakiness in ShardsLimitAllocationDeciderIT.

This change introduces a unit test that deliberately targets the
tightly packed scenario. The RemoteShardsBalancer variant of the
new test will reliably fail if run repeated, though there is still
some non-determinism. Regardless, it is muted along with the exiting
integration test. If/when we improve the intelligence of
RemoteShardsBalancer then we can unmute these tests. For now the tests
exist to document this known limitation.

Signed-off-by: Andrew Ross <andrross@amazon.com>
@andrross andrross force-pushed the shards-tight-capacity-test branch from 4e26181 to 6e0a3dc Compare March 12, 2026 21:23
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 6e0a3dc

@github-actions
Copy link
Contributor

❌ Gradle check result for 6e0a3dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 6e0a3dc: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

✅ Gradle check result for 6e0a3dc: SUCCESS

@codecov
Copy link

codecov bot commented Mar 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.27%. Comparing base (4b9021f) to head (6e0a3dc).
⚠️ Report is 12 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20856      +/-   ##
============================================
- Coverage     73.32%   73.27%   -0.06%     
+ Complexity    72267    72230      -37     
============================================
  Files          5795     5795              
  Lines        330056   330057       +1     
  Branches      47643    47643              
============================================
- Hits         242030   241859     -171     
- Misses        68584    68777     +193     
+ Partials      19442    19421      -21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@cwperks cwperks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see a unit test setup for this. This looks like a test we can parameterize in the future to test more scenarios more extensively.

For my own knowledge, I see that we only have the warm node role when remote is set to true. Is that because it only makes sense in that context? Any harm in having the list of node roles be identical for both cases but it's a noop when remote is false?

@andrross andrross merged commit 4d6ccf1 into opensearch-project:main Mar 13, 2026
41 of 45 checks passed
@andrross andrross deleted the shards-tight-capacity-test branch March 13, 2026 20:23
@andrross
Copy link
Member Author

Any harm in having the list of node roles be identical for both cases but it's a noop when remote is false?

That may be possible. The key difference is that a warm node is configured to use a portion of its disk to swap data from remote, which is a requirement to host the remote shards.

shayush622 pushed a commit to shayush622/OpenSearch that referenced this pull request Mar 16, 2026
RemoteShardsBalancer.balance() only rebalances by primary shard
count, not total shard count. In tight-capacity scenarios where
cluster-wide shard limits leave minimal spare slots, this prevents
the balancer from redistributing replicas to free space on other
nodes, leaving assignable shards unassigned. I believe this is the
cause of the flakiness in ShardsLimitAllocationDeciderIT.

This change introduces a unit test that deliberately targets the
tightly packed scenario. The RemoteShardsBalancer variant of the
new test will reliably fail if run repeated, though there is still
some non-determinism. Regardless, it is muted along with the exiting
integration test. If/when we improve the intelligence of
RemoteShardsBalancer then we can unmute these tests. For now the tests
exist to document this known limitation.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

autocut flaky-test Random test failure that succeeds on second run skip-changelog >test-failure Test failure from CI, local build, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[AUTOCUT] Gradle Check Flaky Test Report for ShardsLimitAllocationDeciderIT

2 participants