Skip to content

Adding metadata to support shard split and to resolve shard ids based on child shard hash#20859

Merged
andrross merged 1 commit intoopensearch-project:mainfrom
vikasvb90:online_shard_split
Mar 17, 2026
Merged

Adding metadata to support shard split and to resolve shard ids based on child shard hash#20859
andrross merged 1 commit intoopensearch-project:mainfrom
vikasvb90:online_shard_split

Conversation

@vikasvb90
Copy link
Contributor

@vikasvb90 vikasvb90 commented Mar 13, 2026

Description

Introduces metadata infrastructure for online (live) shard splitting in OpenSearch.

ShardRange - Represents a hash range owned by a child shard, mapping a shard ID to a contiguous range of hash values. Supports range containment checks.

SplitShardsMetadata - Tracks the shard split tree within IndexMetadata. Maintains root-to-child shard mappings, in-progress split state, and active shard IDs. Given a document's routing hash, it resolves the correct child shard by walking the split tree. Ensures shard range consistency.

primaryTermsMap in IndexMetadata - Adds a Map<Integer, Long> alongside the existing long[] array to support non-contiguous shard IDs that arise after splits. The map becomes the source of truth for primary term look-ups.

OperationRouting - Updated to resolve document routing to the correct child shard by delegating to SplitShardsMetadata.getShardIdOfHash().

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

This is based on the in-place shard split design.

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2026

PR Reviewer Guide 🔍

(Review updated until commit 8d02fc7)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections

🔀 Multiple PR themes

Sub-PR theme: Add ShardRange data structure with serialization support

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java
  • server/src/test/java/org/opensearch/cluster/metadata/ShardRangeTests.java

Sub-PR theme: Add SplitShardsMetadata for tracking shard split tree

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java
  • server/src/test/java/org/opensearch/cluster/metadata/SplitShardsMetadataTests.java

Sub-PR theme: Integrate SplitShardsMetadata and primaryTermsMap into IndexMetadata and OperationRouting

Relevant files:

  • server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java
  • server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java
  • server/src/test/java/org/opensearch/cluster/metadata/IndexMetadataTests.java

⚡ Recommended focus areas for review

Null splitShardsMetadata

In IndexMetadataDiff.apply(), when splitMetadata is null (i.e., when deserializing from a pre-V_3_6_0 node), the builder's splitShardsMetadata is never set. The build() method will then create a default SplitShardsMetadata based on numberOfShards, which may not correctly reflect the actual state being applied. This could silently discard split metadata during rolling upgrades.

if (splitMetadata != null) {
    builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
}
primaryTermsMap size assertion

The assertion assert primaryTermsMap.size() == numberOfShards in the IndexMetadata constructor uses Java assert, which is disabled by default in production. After a shard split, primaryTermsMap may contain entries for child shard IDs that exceed numberOfShards (the root shard count), causing this assertion to fail or be silently bypassed. The relationship between numberOfShards and the map size needs clarification and a proper runtime check.

assert primaryTermsMap.size() == numberOfShards;
this.primaryTermsMap = Collections.unmodifiableMap(primaryTermsMap);
Backward Compat primaryTerms

When writing to pre-V_3_6_0 nodes in writeTo and writeVerifiableTo, buildPrimaryTermsArray() only populates entries where entry.getKey() < numberOfShards. After a shard split, child shard IDs may be >= numberOfShards (the original root count), so their primary terms will be silently dropped when communicating with older nodes. This could cause data inconsistency during rolling upgrades.

private long[] buildPrimaryTermsArray() {
    long[] terms = new long[numberOfShards];
    for (Map.Entry<Integer, Long> entry : primaryTermsMap.entrySet()) {
        if (entry.getKey() < numberOfShards) {
            terms[entry.getKey()] = entry.getValue();
        }
    }
    return terms;
}
Missing splitShardsMetadata in old-version read

In readFrom, when reading from a pre-V_3_6_0 stream, splitShardsMetadata is never set on the builder. The build() method will create a default SplitShardsMetadata with numberOfShards root shards, which may be incorrect if the index had undergone splits before the upgrade. This could cause loss of split tree information during deserialization from older nodes.

} else {
    assert primaryTerms != null;
    Map<Integer, Long> primaryTermsMapFromArray = new HashMap<>();
    for (int shardId = 0; shardId < primaryTerms.length; shardId++) {
        primaryTermsMapFromArray.put(shardId, primaryTerms[shardId]);
    }
    builder.primaryTermsMap(primaryTermsMapFromArray);
}
Stale comment

In test_getActiveShardIterator_splitCompleted, the comment says "Split in progress, should return all 4 shards" but the test is for a completed split returning 5 shards (1, 2, 3, 4, 5). This is misleading and should be corrected.

// Split in progress, should return all 4 shards
assertEquals(expectedActiveShardsForConsecutiveShardSplit, actualActiveShardsForConsecutiveShardSplit);

@github-actions
Copy link
Contributor

github-actions bot commented Mar 13, 2026

PR Code Suggestions ✨

Latest suggestions up to 8d02fc7

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null array from missing XContent field

If numberOfRootShards is still -1 when parse() completes (e.g., the field is missing
from the XContent), rootShardsToAllChildren will remain null, causing a
NullPointerException when the object is used. Add a null-check or validation before
constructing the object.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648]

+if (numberOfRootShards < 0) {
+    throw new IllegalArgumentException("Missing required field: " + KEY_NUMBER_OF_ROOT_SHARDS);
+}
+if (rootShardsToAllChildren == null) {
+    rootShardsToAllChildren = new ShardRange[numberOfRootShards][];
+}
 return new SplitShardsMetadata(
     rootShardsToAllChildren,
     tempShardIdToChildShards,
     inProgressSplitShardIds,
     activeShardIds,
     maxShardId
 );
Suggestion importance[1-10]: 6

__

Why: If numberOfRootShards is -1 or rootShardsToAllChildren is null due to missing XContent fields, a NullPointerException would occur. The suggestion adds a reasonable guard, though in practice the parse method should always encounter these fields in valid serialized data.

Low
Fix test setup that throws before assertion

The builder.startObject() call before the lambda may itself throw an IOException
since it writes to the failing stream, causing the test to fail before reaching
expectThrows. The startObject() call should be inside the expectThrows lambda, or
the failing stream should only start failing after the initial setup writes.

server/src/test/java/org/opensearch/cluster/metadata/ShardRangeTests.java [247-277]

 public void testToXContentIOException() throws IOException {
     ShardRange shardRange = new ShardRange(1, 0, 100);
-    ...
+
+    OutputStream failingStream = new OutputStream() {
+        private boolean started = false;
+
+        @Override
+        public void write(int b) throws IOException {
+            if (started) throw new IOException("Simulated IOException");
+        }
+
+        @Override
+        public void write(byte[] b) throws IOException {
+            if (started) throw new IOException("Simulated IOException");
+        }
+
+        @Override
+        public void write(byte[] b, int off, int len) throws IOException {
+            if (started) throw new IOException("Simulated IOException");
+        }
+    };
+
     XContentBuilder builder = XContentFactory.jsonBuilder(failingStream);
-    builder.startObject(); // Start the JSON object
+    builder.startObject();
 
-    // The actual test - this should throw IOException when trying to write
     expectThrows(IOException.class, () -> {
         shardRange.toXContent(builder, null);
-        builder.endObject(); // End the JSON object
-        builder.close(); // Force flush/close which will trigger the write
+        builder.endObject();
+        builder.close();
     });
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that builder.startObject() writes to the failing stream before expectThrows, potentially causing the test to fail prematurely. The proposed fix using a started flag is a valid approach, though the improved_code doesn't set started = true anywhere, making it incomplete. Still, the underlying issue identified is real and worth addressing.

Low
Guard against null splitShardsMetadata in diff constructor

If after.splitShardsMetadata or before.splitShardsMetadata is null, this line will
throw a NullPointerException. Although build() initializes splitShardsMetadata when
null, it's safer to add null guards here since the diff constructor is called
directly on existing metadata objects.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641]

-splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) {
+    splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+} else {
+    splitMetadata = null;
+}
Suggestion importance[1-10]: 5

__

Why: While build() initializes splitShardsMetadata when null, the diff constructor operates on existing IndexMetadata objects that could theoretically have null splitShardsMetadata. However, since build() always initializes it, this is a low-risk scenario in practice.

Low
Prevent shard ID collision when switching from holes to sequential IDs

When holes are exhausted mid-loop, the code switches to incrementing
nextChildShardId from maxShardId, but nextChildShardId starts at maxShardId and is
only updated when holes are used. If holes run out after some iterations,
nextChildShardId may still be at maxShardId and the first increment produces
maxShardId + 1, which could collide with already-assigned child shard IDs from the
hole-filling phase. The nextChildShardId should be initialized to track the highest
assigned ID correctly across both paths.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [365-378]

 int nextChildShardId = maxShardId, childShardId;
 for (int i = 0; i < numberOfChildren; ++i) {
     if (shardIdHoles.isEmpty()) {
         nextChildShardId++;
-        while (inProgressChildShardIds.contains(nextChildShardId)) {
-            assert activeShardIds.contains(nextChildShardId) == false;
+        while (inProgressChildShardIds.contains(nextChildShardId) || activeShardIds.contains(nextChildShardId)) {
             nextChildShardId++;
         }
         childShardId = nextChildShardId;
     } else {
         childShardId = shardIdHoles.iterator().next();
         shardIdHoles.remove(childShardId);
         nextChildShardId = Math.max(nextChildShardId, childShardId);
     }
Suggestion importance[1-10]: 5

__

Why: The suggestion adds activeShardIds.contains(nextChildShardId) to the skip condition, but the existing code already has assert !activeShardIds.contains(childShardId) after assignment. The collision concern about holes running out mid-loop is partially mitigated by nextChildShardId = Math.max(nextChildShardId, childShardId) tracking the highest hole used, though adding the activeShardIds check is a minor defensive improvement.

Low
Ensure consistent serialization for negative hash values

The shardId is written with writeVInt but start and end are written with writeInt.
Since hash ranges can be negative (as demonstrated in the test
testContainsWithNegativeHash), using writeVInt for start and end would be incorrect
for negative values. However, shardId should also be consistent — verify that
shardId is always non-negative and that start/end are always non-negative before
using writeVInt, or switch all fields to writeInt/readInt for correctness with
negative values.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37]

 public ShardRange(StreamInput in) throws IOException {
     this(in.readVInt(), in.readInt(), in.readInt());
 }
 
+@Override
+public void writeTo(StreamOutput out) throws IOException {
+    out.writeVInt(shardId);
+    out.writeInt(start);
+    out.writeInt(end);
+}
+
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies that shardId uses writeVInt/readVInt while start/end use writeInt/readInt. Since shardId is always non-negative, writeVInt is appropriate for it. The start/end fields already use writeInt/readInt which handles negative values correctly. The suggestion's improved_code doesn't actually change anything from the existing code, making it a verification suggestion rather than a fix.

Low
General
Preserve existing split metadata when diff is null

When splitMetadata is null (deserialized from a pre-V_3_6_0 node), the builder's
splitShardsMetadata field is never set, so build() will create a default one based
on numberOfShards. However, part.splitShardsMetadata may contain valid data that
should be preserved. Consider falling back to part.splitShardsMetadata when
splitMetadata is null.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 6

__

Why: When splitMetadata is null (pre-V_3_6_0 deserialization), the existing part.splitShardsMetadata data is silently discarded and replaced with a default. Falling back to part.splitShardsMetadata preserves existing state correctly during mixed-version cluster upgrades.

Low
Use consistent index reference for ShardId construction

The method accepts a shardIndex parameter of type Index but uses it directly to
construct a ShardId, bypassing the actual index metadata lookup. This could result
in a mismatch if shardIndex doesn't correspond to the index named index. The Index
object used should be derived from the resolved indexMetadata to ensure consistency.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505]

 public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) {
-    int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true);
-    return new ShardId(shardIndex, shardId);
+    IndexMetadata indexMetadata = indexMetadata(clusterState, index);
+    int shardId = generateShardId(indexMetadata, id, routing, true);
+    return new ShardId(indexMetadata.getIndex(), shardId);
 }
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies a potential inconsistency where shardIndex (passed as parameter) may not match the indexMetadata resolved from index. Using indexMetadata.getIndex() ensures the ShardId is constructed with the correct Index object, avoiding subtle bugs if the caller passes a mismatched shardIndex.

Low

Previous suggestions

Suggestions up to commit 49efa6a
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve existing split metadata when diff is absent

When splitMetadata is null (for pre-V_3_6_0 nodes), the builder's
splitShardsMetadata is never set, so it will be initialized to a default value in
build() based on numberOfShards. However, part.splitShardsMetadata may contain valid
data that should be preserved. The existing split metadata from part should be
carried over when no diff is available.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 7

__

Why: When splitMetadata is null (pre-V_3_6_0 nodes), the existing splitShardsMetadata from part is not preserved, which could cause data loss. The fix correctly carries over the existing metadata when no diff is available.

Medium
Fix inconsistent integer serialization for shardId field

The shardId is written with writeVInt but start and end are written with writeInt.
Since hash ranges can be negative (as demonstrated in the test
testContainsWithNegativeHash), using writeVInt for start and end would be incorrect.
However, shardId should also consistently use writeInt/readInt unless negative
values are explicitly excluded. More critically, start and end should use readInt to
match writeInt, which is already correct — but verify that shardId cannot be
negative before using readVInt.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37]

 public ShardRange(StreamInput in) throws IOException {
-    this(in.readVInt(), in.readInt(), in.readInt());
+    this(in.readInt(), in.readInt(), in.readInt());
 }
Suggestion importance[1-10]: 7

__

Why: The shardId is read with readVInt in the constructor but written with writeVInt in writeTo. While shard IDs are typically non-negative, using readVInt/writeVInt is inconsistent with start/end which use readInt/writeInt. This is a valid concern about serialization consistency, though shard IDs in practice are non-negative.

Medium
Use consistent integer serialization method for all fields

shardId is serialized with writeVInt which only supports non-negative values, but
start and end use writeInt. If shardId can ever be -1 (as used in the parse method
as a default/sentinel value), writeVInt will throw or produce incorrect results. Use
writeInt consistently for all three fields to match the readInt calls and support
the full int range.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58]

 @Override
 public void writeTo(StreamOutput out) throws IOException {
-    out.writeVInt(shardId);
+    out.writeInt(shardId);
     out.writeInt(start);
     out.writeInt(end);
 }
Suggestion importance[1-10]: 7

__

Why: The writeTo method uses writeVInt for shardId but writeInt for start and end, creating an inconsistency. Since shardId is read back with readVInt in the constructor, the round-trip is technically consistent, but using writeInt/readInt uniformly would be cleaner and avoid potential issues if shardId ever takes a negative sentinel value like -1.

Medium
Guard against null array from missing parsed field

If numberOfRootShards is -1 (i.e., the KEY_NUMBER_OF_ROOT_SHARDS field was not found
in the parsed content), rootShardsToAllChildren will remain null, causing a
NullPointerException when passed to the constructor. Add a null-check or validation
before constructing the object.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648]

+if (numberOfRootShards == -1 || rootShardsToAllChildren == null) {
+    rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][];
+}
 return new SplitShardsMetadata(
     rootShardsToAllChildren,
     tempShardIdToChildShards,
     inProgressSplitShardIds,
     activeShardIds,
     maxShardId
 );
Suggestion importance[1-10]: 6

__

Why: If numberOfRootShards is -1 or rootShardsToAllChildren is null due to missing fields in the parsed content, a NullPointerException would occur. The suggestion adds a valid guard, though in practice well-formed data should always include these fields.

Low
Guard against null pointer in diff computation

If after.splitShardsMetadata or before.splitShardsMetadata is null (e.g., when
constructing a diff involving older-format metadata), this will throw a
NullPointerException. Add null guards before calling diff.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641]

-splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) {
+    splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+} else {
+    splitMetadata = null;
+}
Suggestion importance[1-10]: 5

__

Why: While the build() method initializes splitShardsMetadata to a default value when null, there could be edge cases where it remains null. However, the build() method in IndexMetadata.Builder already ensures splitShardsMetadata is never null, making this a lower-priority concern.

Low
General
Validate split-in-progress before proceeding with update

findRootAndShard can throw an IllegalArgumentException if sourceShardId is not
found, but the method is supposed to validate via assertions. More critically, if
sourceShardId is not a root shard and not found in rootShardsToAllChildren, the
exception message says "Shard ID doesn't exist" rather than indicating the split is
not in progress. The findRootAndShard result shardRangeTuple should be validated
before use to avoid a potential NPE if the method is called with an invalid
sourceShardId when assertions are disabled.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [428-431]

 public void updateSplitMetadataForChildShards(int sourceShardId, Set<Integer> newChildShardIds) {
+    if (!inProgressSplitShardIds.contains(sourceShardId)) {
+        throw new IllegalArgumentException("No split in progress for shard [" + sourceShardId + "]");
+    }
     Tuple<Integer, ShardRange> shardRangeTuple = findRootAndShard(sourceShardId, rootShardsToAllChildren);
 
     assert inProgressSplitShardIds.contains(sourceShardId);
Suggestion importance[1-10]: 5

__

Why: The current code uses an assert to check if the split is in progress, which is disabled in production. Moving this to an explicit check with a proper exception improves robustness, though the existing test testUpdateSplitMetadataWithInvalidSourceShardId already expects an IllegalArgumentException from findRootAndShard.

Low
Use actual index metadata instead of caller-provided index

The method accepts shardIndex as an Index parameter but uses it directly in new
ShardId(shardIndex, shardId). This means the returned ShardId uses the provided
shardIndex rather than the actual index from indexMetadata, which could lead to
mismatches if the caller passes an incorrect Index. Consider deriving the index from
indexMetadata for correctness.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505]

 public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) {
-    int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true);
-    return new ShardId(shardIndex, shardId);
+    IndexMetadata indexMetadata = indexMetadata(clusterState, index);
+    int shardId = generateShardId(indexMetadata, id, routing, true);
+    return new ShardId(indexMetadata.getIndex(), shardId);
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion raises a valid concern about potential index mismatch, but the method signature accepting shardIndex as a parameter suggests it's intentionally designed to allow the caller to specify the index. Without more context on the caller's intent, this change may not be appropriate.

Low
Suggestions up to commit 9a4e365
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null array in parsed metadata

If numberOfRootShards is never parsed (remains -1) or rootShardsToAllChildren is
never assigned (remains null), the constructor will receive a null array, causing
NullPointerExceptions later. Add a null/validity check before constructing the
object, initializing rootShardsToAllChildren to an empty array when
numberOfRootShards is 0 or when it was not found in the parsed content.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648]

+if (rootShardsToAllChildren == null) {
+    rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][];
+}
 return new SplitShardsMetadata(
     rootShardsToAllChildren,
     tempShardIdToChildShards,
     inProgressSplitShardIds,
     activeShardIds,
     maxShardId
 );
Suggestion importance[1-10]: 7

__

Why: If numberOfRootShards is -1 or rootShardsToAllChildren is never assigned during parsing (e.g., missing field), passing null to the constructor would cause NullPointerException later. The fix correctly initializes the array as a fallback, improving robustness.

Medium
Fix inconsistent serialization of shardId field

The shardId is written with writeVInt but start and end are written with writeInt.
Since start and end can be negative (as tested in testContainsWithNegativeHash),
using writeVInt for them would be incorrect, but using writeInt for shardId would
also be inconsistent. Ensure the read methods match the write methods exactly — if
shardId can be negative or large, consider using writeInt/readInt consistently for
all three fields.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37]

 public ShardRange(StreamInput in) throws IOException {
-    this(in.readVInt(), in.readInt(), in.readInt());
+    this(in.readInt(), in.readInt(), in.readInt());
 }
Suggestion importance[1-10]: 7

__

Why: The shardId is read with readVInt but written with writeVInt, which only handles non-negative values. Since shardId can be -1 (as seen in the parse method's default value), this is a real serialization bug. The fix to use readInt/writeInt consistently is valid and important.

Medium
Use consistent integer serialization for all fields

writeVInt only handles non-negative integers correctly; if shardId could ever be
negative (e.g., the default -1 used in parse), serialization would be corrupted. Use
writeInt for shardId to be consistent with start and end, and to safely handle any
value.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58]

 @Override
 public void writeTo(StreamOutput out) throws IOException {
-    out.writeVInt(shardId);
+    out.writeInt(shardId);
     out.writeInt(start);
     out.writeInt(end);
 }
Suggestion importance[1-10]: 7

__

Why: Using writeVInt for shardId is problematic since shardId can be -1 (the default in parse), and writeVInt doesn't handle negative values correctly. Switching to writeInt for consistency and correctness is a valid fix that addresses a real serialization bug.

Medium
Preserve split metadata when diff is absent

When splitMetadata is null (for pre-V_3_6_0 nodes), the builder's
splitShardsMetadata field is never set, so it will be null and then initialized to a
default in build(). This means the applied metadata loses the original
splitShardsMetadata from part. The original value from part should be preserved when
no diff is available.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 6

__

Why: When splitMetadata is null (pre-V_3_6_0 nodes), the builder's splitShardsMetadata is left unset and will be re-initialized to a default in build(), losing the original value from part. Preserving part.splitShardsMetadata is more correct for backward compatibility.

Low
Null-guard split metadata diff computation

If after.splitShardsMetadata or before.splitShardsMetadata is null (e.g., when
dealing with older metadata that hasn't been upgraded yet), this will throw a
NullPointerException. Add null checks before computing the diff.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641]

-splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) {
+    splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+} else {
+    splitMetadata = null;
+}
Suggestion importance[1-10]: 5

__

Why: If either after.splitShardsMetadata or before.splitShardsMetadata is null, calling .diff() will throw a NullPointerException. However, the build() method initializes splitShardsMetadata to a default when null, so this scenario may be unlikely in practice.

Low
General
Replace assertion with explicit validation check

findRootAndShard can throw IllegalArgumentException if the shard is not found, but
the result shardRangeTuple is used without a null check. More critically, if
sourceShardId is not in inProgressSplitShardIds, the assertion will only catch this
in debug mode; in production this will silently corrupt state. The method should
throw an IllegalArgumentException instead of relying on assertions for validation.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [428-431]

 public void updateSplitMetadataForChildShards(int sourceShardId, Set<Integer> newChildShardIds) {
+    if (!inProgressSplitShardIds.contains(sourceShardId)) {
+        throw new IllegalArgumentException("No split in progress for shard [" + sourceShardId + "]");
+    }
     Tuple<Integer, ShardRange> shardRangeTuple = findRootAndShard(sourceShardId, rootShardsToAllChildren);
+    assert newChildShardIds.size() == parentToChildShards.get(sourceShardId).length;
 
-    assert inProgressSplitShardIds.contains(sourceShardId);
-
Suggestion importance[1-10]: 5

__

Why: Using assert for validation of inProgressSplitShardIds.contains(sourceShardId) only works in debug mode; in production this silently passes and can corrupt state. Replacing with an explicit IllegalArgumentException is safer, though the existing tests already cover this via AssertionError.

Low
Ensure ShardId uses consistent index reference

The method accepts a shardIndex parameter of type Index but uses it directly to
construct a ShardId, bypassing the actual index metadata lookup. If shardIndex
doesn't correspond to the index string parameter, the returned ShardId could
reference the wrong index. Consider deriving the Index from the resolved
IndexMetadata to ensure consistency.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505]

-public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) {
-    int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true);
-    return new ShardId(shardIndex, shardId);
+public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing) {
+    IndexMetadata indexMetadata = indexMetadata(clusterState, index);
+    int shardId = generateShardId(indexMetadata, id, routing, true);
+    return new ShardId(indexMetadata.getIndex(), shardId);
 }
Suggestion importance[1-10]: 4

__

Why: The concern about shardIndex potentially not matching the index string is valid, but the method signature accepting Index shardIndex may be intentional (e.g., for cases where the caller needs to specify a particular index object). The suggestion changes the method signature which could break callers, and the actual use case context is unclear from the diff alone.

Low
Suggestions up to commit 4618229
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve split metadata when diff is absent

When splitMetadata is null (the old-version deserialization path), the apply method
silently skips updating splitShardsMetadata, so the resulting IndexMetadata will
have whatever default the builder produces rather than the value from part. This
means the split metadata from the existing state is lost during diff application for
pre-V_3_6_0 nodes. You should copy it from part when splitMetadata is null.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 7

__

Why: When splitMetadata is null (pre-V_3_6_0 deserialization path), the apply method skips updating splitShardsMetadata, causing the split metadata from the existing state (part) to be lost. The fix correctly preserves it by copying from part, preventing potential data loss during diff application.

Medium
Guard against null array after XContent parsing

If numberOfRootShards is still -1 when parse() completes (e.g., the field was
missing or appeared after KEY_ROOT_SHARDS_TO_ALL_CHILDREN), rootShardsToAllChildren
will be null, causing a NullPointerException when the constructor or any caller
accesses it. Add a null-check or validation before constructing the object.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648]

+if (rootShardsToAllChildren == null) {
+    rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][];
+}
 return new SplitShardsMetadata(
     rootShardsToAllChildren,
     tempShardIdToChildShards,
     inProgressSplitShardIds,
     activeShardIds,
     maxShardId
 );
Suggestion importance[1-10]: 6

__

Why: If numberOfRootShards is -1 or rootShardsToAllChildren remains null due to field ordering or missing data, a NullPointerException would occur. The fix is reasonable but the scenario may be unlikely in practice since the parser processes fields sequentially and the format is controlled.

Low
Prevent duplicate shard ID assignment during split

When holes are exhausted mid-loop, the code switches to incrementing
nextChildShardId from maxShardId, but nextChildShardId was only updated via Math.max
when consuming holes. If the last hole ID is less than maxShardId, the next
increment starts from maxShardId + 1, which is correct, but if a hole ID exceeds
maxShardId, subsequent non-hole IDs could collide with already-consumed hole IDs.
Additionally, the allConsumedShardIds list does not include the newly assigned child
shard IDs within the same loop iteration, so duplicate IDs can be assigned across
iterations when holes run out.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [365-378]

 int nextChildShardId = maxShardId, childShardId;
 for (int i = 0; i < numberOfChildren; ++i) {
     if (shardIdHoles.isEmpty()) {
-        nextChildShardId++;
-        while (inProgressChildShardIds.contains(nextChildShardId)) {
-            assert activeShardIds.contains(nextChildShardId) == false;
+        do {
             nextChildShardId++;
-        }
+        } while (inProgressChildShardIds.contains(nextChildShardId) || activeShardIds.contains(nextChildShardId));
         childShardId = nextChildShardId;
     } else {
         childShardId = shardIdHoles.iterator().next();
         shardIdHoles.remove(childShardId);
         nextChildShardId = Math.max(nextChildShardId, childShardId);
     }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about potential duplicate shard ID assignment when holes are exhausted mid-loop, but the existing code already tracks inProgressChildShardIds and uses assertions to catch duplicates. The improved code adds activeShardIds check to the loop condition, which is a minor defensive improvement.

Low
Prevent index mismatch in shard ID construction

The method accepts a shardIndex parameter of type Index but uses it directly to
construct the ShardId, rather than using the index from indexMetadata. This could
lead to a mismatch if shardIndex doesn't correspond to the same index as index
string. Consider deriving the Index from indexMetadata to ensure consistency.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505]

-public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) {
-    int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true);
-    return new ShardId(shardIndex, shardId);
+public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing) {
+    IndexMetadata indexMetadata = indexMetadata(clusterState, index);
+    int shardId = generateShardId(indexMetadata, id, routing, true);
+    return new ShardId(indexMetadata.getIndex(), shardId);
 }
Suggestion importance[1-10]: 5

__

Why: The shardIndex parameter could potentially differ from the index resolved via indexMetadata, leading to a mismatch in the returned ShardId. Deriving the Index from indexMetadata would be safer, though the caller may intentionally pass a specific shardIndex for the recovering child scenario.

Low
Consistent integer serialization for all fields

The shardId is written with writeVInt but start and end are written with writeInt.
Since hash ranges can be negative (as evidenced by the test
testContainsWithNegativeHash using -10 to 10), using writeVInt for shardId is fine,
but start and end should consistently use writeInt for reading. However, shardId
should also be consistent — if it can only be non-negative, readVInt is correct. The
real issue is that start could be negative, so ensure writeInt/readInt is used for
both start and end, which it is. But shardId uses writeVInt/readVInt — verify this
is intentional and that shard IDs are always non-negative.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37]

 public ShardRange(StreamInput in) throws IOException {
-    this(in.readVInt(), in.readInt(), in.readInt());
+    this(in.readInt(), in.readInt(), in.readInt());
 }
 
+@Override
+public void writeTo(StreamOutput out) throws IOException {
+    out.writeInt(shardId);
+    out.writeInt(start);
+    out.writeInt(end);
+}
+
Suggestion importance[1-10]: 4

__

Why: Shard IDs are always non-negative integers, so writeVInt/readVInt is a valid and more space-efficient choice. The suggestion to change to readInt/writeInt for shardId is a style preference rather than a bug fix. The start and end fields already correctly use writeInt/readInt to handle negative values.

Low
General
Prevent silent data loss in backward-compatible serialization

The primaryTermsMap in IndexMetadataDiff may contain shard IDs beyond the original
numberOfShards (e.g., after a split), but the old-format array is sized by
INDEX_NUMBER_OF_SHARDS_SETTING. Entries with keys ≥ numShards are silently dropped,
which can cause data loss or incorrect primary terms on the receiving pre-V_3_6_0
node. This backward-compatibility path should either reject the serialization or
document that split-extended shards cannot be sent to old nodes.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1712-1721]

 if (out.getVersion().before(Version.V_3_6_0)) {
     int numShards = INDEX_NUMBER_OF_SHARDS_SETTING.get(settings);
+    for (Integer key : primaryTermsMap.keySet()) {
+        if (key >= numShards) {
+            throw new IllegalStateException(
+                "Cannot serialize primaryTermsMap with shard id [" + key + "] to node with version before V_3_6_0"
+            );
+        }
+    }
     long[] primaryTermsArray = new long[numShards];
     for (Map.Entry<Integer, Long> entry : primaryTermsMap.entrySet()) {
-        if (entry.getKey() < numShards) {
-            primaryTermsArray[entry.getKey()] = entry.getValue();
-        }
+        primaryTermsArray[entry.getKey()] = entry.getValue();
     }
     out.writeVLongArray(primaryTermsArray);
 }
Suggestion importance[1-10]: 5

__

Why: The concern about silently dropping shard IDs beyond numShards during backward-compatible serialization is valid, but throwing an exception may break rolling upgrades. The suggestion correctly identifies the issue but the proposed fix (throwing an exception) could be too disruptive; documenting the limitation or logging a warning might be more appropriate.

Low
Add validation after parsing shard range values

The parser only handles VALUE_NUMBER tokens, but unknown field types or unexpected
tokens are silently ignored without any warning or error. Additionally, there is no
validation that the parsed values are valid (e.g., start <= end, non-negative
shardId). Consider adding validation after parsing to catch malformed data early.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [74-82]

 } else if (token == XContentParser.Token.VALUE_NUMBER) {
     if ("shard_id".equals(fieldName)) {
         shardId = parser.intValue();
     } else if ("start".equals(fieldName)) {
         start = parser.intValue();
     } else if ("end".equals(fieldName)) {
         end = parser.intValue();
     }
+} else if (token != XContentParser.Token.VALUE_NULL) {
+    parser.skipChildren();
+}
+// After loop:
+if (shardId != -1 && start > end) {
+    throw new IllegalArgumentException("ShardRange start=" + start + " must be <= end=" + end);
 }
Suggestion importance[1-10]: 3

__

Why: Adding validation for parsed values is a good defensive practice, but the improved_code mixes the loop body change with post-loop validation in a confusing way. The suggestion is a minor improvement to robustness rather than fixing a critical bug.

Low
Suggestions up to commit ec0bf77
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null array on parse completion

If numberOfRootShards is still -1 when parse() completes (e.g., the field was
missing or appeared after KEY_ROOT_SHARDS_TO_ALL_CHILDREN), rootShardsToAllChildren
will remain null, causing a NullPointerException when the object is used. Add a
null-check or initialize rootShardsToAllChildren to an empty array as a fallback
before constructing the object.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648]

+if (rootShardsToAllChildren == null) {
+    rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][];
+}
 return new SplitShardsMetadata(
     rootShardsToAllChildren,
     tempShardIdToChildShards,
     inProgressSplitShardIds,
     activeShardIds,
     maxShardId
 );
Suggestion importance[1-10]: 7

__

Why: If rootShardsToAllChildren is null when parse() completes (e.g., missing field or ordering issue), a NullPointerException will occur when the object is used. Adding a null-check before constructing the object is a valid defensive measure.

Medium
Preserve existing split metadata when diff is null

When splitMetadata is null (the pre-V_3_6_0 deserialization path), the applied
IndexMetadata will have no splitShardsMetadata set, so build() will create a
brand-new default one instead of preserving the one from part. This silently
discards the existing split metadata on the part being patched. The existing
metadata from part should be carried over when splitMetadata is null.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 7

__

Why: When splitMetadata is null (pre-V_3_6_0 path), the build() method creates a new default splitShardsMetadata instead of preserving the one from part, silently discarding existing split metadata. Carrying over part.splitShardsMetadata in the else branch is a valid fix.

Medium
Fix potential negative value serialization issue

shardId is serialized with writeVInt but deserialized with readVInt. While writeVInt
only supports non-negative values, if shardId can ever be -1 (as used in the
default/empty parse case), this would cause incorrect serialization. Use
writeInt/readInt for shardId to be safe and consistent with start and end.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58]

 @Override
 public void writeTo(StreamOutput out) throws IOException {
-    out.writeVInt(shardId);
+    out.writeInt(shardId);
     out.writeInt(start);
     out.writeInt(end);
 }
Suggestion importance[1-10]: 6

__

Why: The parse method uses -1 as a default for shardId, and if a ShardRange with shardId=-1 is serialized using writeVInt, it would cause incorrect behavior since writeVInt doesn't support negative values. Using writeInt for shardId would be safer and consistent with start and end.

Low
Null-check before computing split metadata diff

If before.splitShardsMetadata is null (e.g., when upgrading from a pre-V_3_6_0 node
that never set it), calling diff on it will throw a NullPointerException. A
null-check should be added before computing the diff.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641]

-splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata);
+splitMetadata = (before.splitShardsMetadata != null)
+    ? after.splitShardsMetadata.diff(before.splitShardsMetadata)
+    : after.splitShardsMetadata.diff(new SplitShardsMetadata.Builder(after.numberOfShards).build());
Suggestion importance[1-10]: 5

__

Why: If before.splitShardsMetadata is null during an upgrade scenario, calling diff would throw a NullPointerException. However, the build() method already initializes splitShardsMetadata to a default value when null, so this scenario may be unlikely in practice.

Low
Consistent integer serialization for all fields

The shardId is written with writeVInt but start and end are written with writeInt.
Since hash ranges can be negative (as evidenced by the test
testContainsWithNegativeHash), using writeVInt for shardId is fine, but start and
end should consistently use readInt/writeInt. However, shardId should also not use
writeVInt if it could ever be negative. Verify that shardId is always non-negative;
if so, writeVInt is acceptable, but consider using writeInt for consistency and
safety.

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37]

 public ShardRange(StreamInput in) throws IOException {
-    this(in.readVInt(), in.readInt(), in.readInt());
+    this(in.readInt(), in.readInt(), in.readInt());
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion raises a valid concern about consistency between writeVInt/readVInt for shardId vs writeInt/readInt for start and end. Since shardId is typically non-negative, writeVInt is functionally correct, but using writeInt for all fields would be more consistent. The suggestion is valid but has low-to-moderate impact.

Low
General
Recalculate max shard ID after cancel

When a split is cancelled, the temporary child shard IDs that were allocated during
splitShard are removed from parentToChildShards, but maxShardId is never rolled
back. If those IDs were newly allocated (beyond the previous maxShardId), subsequent
splits will skip those IDs rather than reusing them, causing gaps. The findHoles
logic partially compensates, but maxShardId itself should be recalculated or the
cancelled children's IDs should be tracked so they can be reused.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [471-475]

 public void cancelSplit(int sourceShardId) {
     assert inProgressSplitShardIds.contains(sourceShardId);
     inProgressSplitShardIds.remove(sourceShardId);
     parentToChildShards.remove(sourceShardId);
+    // Recalculate maxShardId since cancelled children may have been at the top
+    int newMax = rootShardsToAllChildren.length - 1;
+    for (Integer parent : parentToChildShards.keySet()) {
+        for (ShardRange child : parentToChildShards.get(parent)) {
+            newMax = Math.max(newMax, child.shardId());
+        }
+    }
+    for (Integer active : activeShardIds) {
+        newMax = Math.max(newMax, active);
+    }
+    maxShardId = newMax;
 }
Suggestion importance[1-10]: 5

__

Why: After cancelling a split, maxShardId is not rolled back, causing gaps in shard ID allocation. The findHoles logic partially compensates by reusing gaps, but recalculating maxShardId would be more correct. The suggested fix is logically sound and addresses a real gap in the implementation.

Low
Prevent index mismatch in shard ID construction

The method accepts shardIndex as a parameter but uses it directly to construct the
ShardId, while the index string parameter is used only to look up IndexMetadata.
This could lead to a mismatch if shardIndex doesn't correspond to the same index as
index. Consider deriving the Index from the resolved IndexMetadata to ensure
consistency.

server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505]

 public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) {
-    int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true);
-    return new ShardId(shardIndex, shardId);
+    IndexMetadata indexMetadata = indexMetadata(clusterState, index);
+    int shardId = generateShardId(indexMetadata, id, routing, true);
+    return new ShardId(indexMetadata.getIndex(), shardId);
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion identifies a potential inconsistency where shardIndex parameter might not correspond to the index string used for metadata lookup. Using indexMetadata.getIndex() would ensure consistency, though in practice callers likely pass matching values. The improved code is accurate and addresses a real design concern.

Low
Suggestions up to commit 2db9022
CategorySuggestion                                                                                                                                    Impact
Possible issue
Preserve existing split metadata when diff is absent

When splitMetadata is null (deserialized from a pre-V_3_6_0 node), the builder's
splitShardsMetadata field is never set, so build() will create a default one using
numberOfShards. However, part.splitShardsMetadata may contain valid state that
should be preserved. The diff should always be applied; if there is no new split
metadata, the existing one from part should be carried forward.

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762]

 if (splitMetadata != null) {
     builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata));
+} else {
+    builder.splitShardsMetadata(part.splitShardsMetadata);
 }
Suggestion importance[1-10]: 7

__

Why: When splitMetadata is null (pre-V_3_6_0 deserialization), the existing part.splitShardsMetadata state is not preserved in the applied result. The build() method will create a default SplitShardsMetadata instead, potentially losing valid state from the base object.

Medium
Prevent duplicate shard ID assignment during split

When holes are exhausted mid-loop, the code switches to incrementing
nextChildShardId from maxShardId, but nextChildShardId was only updated via Math.max
when consuming holes. If holes were consumed and nextChildShardId is still
maxShardId, the next increment may produce a shard ID that is already active or
in-progress. The while loop skipping in-progress IDs does not also skip active IDs,
which could lead to duplicate shard ID assignment.

[server/src/main/java...

@github-actions
Copy link
Contributor

❌ Gradle check result for a79f554: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

Persistent review updated to latest commit 276231f

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from 276231f to 6f0508b Compare March 13, 2026 13:14
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 6f0508b.

PathLineSeverityDescription
server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java557mediumThe core document routing function now delegates final shard selection to SplitShardsMetadata.getShardIdOfHash(), replacing the previously deterministic hash-modulo calculation. If splitShardsMetadata could be externally manipulated (e.g., via cluster state poisoning), an attacker could route documents to arbitrary shards. This is consistent with the feature intent but represents a meaningful expansion of the cluster state attack surface for data misrouting. Warrants review of how splitShardsMetadata is authorized and validated before being applied to cluster state.
server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java1157lowThe safety assertion was changed from `primaryTerms.length == numberOfShards` (array-based, always active) to `primaryTermsMap.size() == numberOfShards` (map-based). Java assertions are disabled by default in production JVMs unless -ea is explicitly passed. The old check on the array length would still apply during the same constructor call since primaryTerms is also set, but if the array path is later removed, this weakens a structural invariant check to one that may be silently skipped in production. Not clearly malicious, but worth noting as a regression in defensive validation.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from 6f0508b to cc1269d Compare March 15, 2026 10:30
@github-actions
Copy link
Contributor

Persistent review updated to latest commit cc1269d

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from cc1269d to f642557 Compare March 15, 2026 10:57
@github-actions
Copy link
Contributor

Persistent review updated to latest commit f642557

@vikasvb90
Copy link
Contributor Author

Thanks @andrross for the review! Addressed your comments. Can you please take a look again?

@github-actions
Copy link
Contributor

❌ Gradle check result for f642557: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from f642557 to dd496ee Compare March 16, 2026 03:00
@github-actions
Copy link
Contributor

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit dd496ee.

PathLineSeverityDescription
server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java151lowbinarySearchShards() can return null when no matching shard range is found, but multiple callers rely solely on 'assert shardRange != null' for safety. With JVM assertions disabled (default in production), this results in a NullPointerException during document routing, which could cause an availability/DoS condition for affected shards. Not malicious, but a latent crash path in production.
server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java1709lowIn IndexMetadataDiff.writeTo(), the backward-compatible serialization for pre-V_3_6_0 nodes silently drops primaryTermsMap entries where shard key >= numShards (child shards created by splits). This means nodes on the old wire protocol receive a truncated primary-terms array with zeros for split child shards, which could cause incorrect primary-term fencing decisions on mixed-version clusters. Appears intentional for compatibility, but the silent data loss is worth review.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2


Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.


⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

@github-actions
Copy link
Contributor

Persistent review updated to latest commit dd496ee

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from dd496ee to 33769e5 Compare March 16, 2026 03:16
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 33769e5

@github-actions
Copy link
Contributor

❌ Gradle check result for 33769e5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from 33769e5 to 2db9022 Compare March 16, 2026 05:58
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 2db9022

@github-actions
Copy link
Contributor

❌ Gradle check result for 2db9022: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from 2db9022 to ec0bf77 Compare March 16, 2026 08:52
@github-actions
Copy link
Contributor

Persistent review updated to latest commit ec0bf77

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from ec0bf77 to 4618229 Compare March 16, 2026 09:07
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 4618229

@github-actions
Copy link
Contributor

❌ Gradle check result for 4618229: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@vikasvb90 vikasvb90 force-pushed the online_shard_split branch from 4618229 to 9a4e365 Compare March 16, 2026 09:15
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 9a4e365

@github-actions
Copy link
Contributor

❌ Gradle check result for 9a4e365: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@andrross andrross force-pushed the online_shard_split branch from 9a4e365 to 49efa6a Compare March 16, 2026 23:48
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 49efa6a

@github-actions
Copy link
Contributor

❌ Gradle check result for 49efa6a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

… on child shard hash

Signed-off-by: vikasvb90 <vikasvb@amazon.com>
@andrross andrross force-pushed the online_shard_split branch from 49efa6a to 8d02fc7 Compare March 17, 2026 19:40
@github-actions
Copy link
Contributor

Persistent review updated to latest commit 8d02fc7

@github-actions
Copy link
Contributor

✅ Gradle check result for 8d02fc7: SUCCESS

@andrross andrross merged commit ed2e100 into opensearch-project:main Mar 17, 2026
33 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Shard Management Project Board Mar 17, 2026
@codecov
Copy link

codecov bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 81.06509% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.31%. Comparing base (28fa177) to head (8d02fc7).
⚠️ Report is 7 commits behind head on main.

Files with missing lines Patch % Lines
...ensearch/cluster/metadata/SplitShardsMetadata.java 85.11% 26 Missing and 27 partials ⚠️
...org/opensearch/cluster/metadata/IndexMetadata.java 66.10% 21 Missing and 19 partials ⚠️
...g/opensearch/cluster/routing/OperationRouting.java 71.42% 2 Missing ⚠️
...va/org/opensearch/cluster/metadata/ShardRange.java 96.15% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #20859      +/-   ##
============================================
- Coverage     73.32%   73.31%   -0.02%     
- Complexity    72272    72410     +138     
============================================
  Files          5797     5804       +7     
  Lines        330323   330885     +562     
  Branches      47676    47822     +146     
============================================
+ Hits         242215   242576     +361     
- Misses        68663    68828     +165     
- Partials      19445    19481      +36     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrross added a commit to andrross/OpenSearch that referenced this pull request Mar 20, 2026
This reverts commit ed2e100.

The change to IndexMetadata has resulted in bwc failures with the remote
cluster state feature.
andrross added a commit to andrross/OpenSearch that referenced this pull request Mar 20, 2026
This reverts commit ed2e100.

The change to IndexMetadata has resulted in bwc failures with the remote
cluster state feature.

Signed-off-by: Andrew Ross <andrross@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: ✅ Done

Development

Successfully merging this pull request may close these issues.

2 participants