Adding metadata to support shard split and to resolve shard ids based on child shard hash by vikasvb90 · Pull Request #20859 · opensearch-project/OpenSearch

vikasvb90 · 2026-03-13T11:18:26Z

Description

Introduces metadata infrastructure for online (live) shard splitting in OpenSearch.

ShardRange - Represents a hash range owned by a child shard, mapping a shard ID to a contiguous range of hash values. Supports range containment checks.

SplitShardsMetadata - Tracks the shard split tree within IndexMetadata. Maintains root-to-child shard mappings, in-progress split state, and active shard IDs. Given a document's routing hash, it resolves the correct child shard by walking the split tree. Ensures shard range consistency.

primaryTermsMap in IndexMetadata - Adds a Map<Integer, Long> alongside the existing long[] array to support non-contiguous shard IDs that arise after splits. The map becomes the source of truth for primary term look-ups.

OperationRouting - Updated to resolve document routing to the correct child shard by delegating to SplitShardsMetadata.getShardIdOfHash().

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

This is based on the in-place shard split design.

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-13T11:20:06Z

PR Reviewer Guide 🔍

(Review updated until commit `8d02fc7`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
📝 TODO sections server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1763]: support ingestion source
🔀 Multiple PR themes Sub-PR theme: Add ShardRange data structure with serialization support Relevant files: server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java server/src/test/java/org/opensearch/cluster/metadata/ShardRangeTests.java Sub-PR theme: Add SplitShardsMetadata for tracking shard split tree Relevant files: server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java server/src/test/java/org/opensearch/cluster/metadata/SplitShardsMetadataTests.java Sub-PR theme: Integrate SplitShardsMetadata and primaryTermsMap into IndexMetadata and OperationRouting Relevant files: server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java server/src/test/java/org/opensearch/cluster/metadata/IndexMetadataTests.java
⚡ Recommended focus areas for review Null splitShardsMetadata In `IndexMetadataDiff.apply()`, when `splitMetadata` is null (i.e., when deserializing from a pre-V_3_6_0 node), the builder's `splitShardsMetadata` is never set. The `build()` method will then create a default `SplitShardsMetadata` based on `numberOfShards`, which may not correctly reflect the actual state being applied. This could silently discard split metadata during rolling upgrades. if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); } primaryTermsMap size assertion The assertion `assert primaryTermsMap.size() == numberOfShards` in the `IndexMetadata` constructor uses Java `assert`, which is disabled by default in production. After a shard split, `primaryTermsMap` may contain entries for child shard IDs that exceed `numberOfShards` (the root shard count), causing this assertion to fail or be silently bypassed. The relationship between `numberOfShards` and the map size needs clarification and a proper runtime check. assert primaryTermsMap.size() == numberOfShards; this.primaryTermsMap = Collections.unmodifiableMap(primaryTermsMap); Backward Compat primaryTerms When writing to pre-V_3_6_0 nodes in `writeTo` and `writeVerifiableTo`, `buildPrimaryTermsArray()` only populates entries where `entry.getKey() < numberOfShards`. After a shard split, child shard IDs may be >= `numberOfShards` (the original root count), so their primary terms will be silently dropped when communicating with older nodes. This could cause data inconsistency during rolling upgrades. private long[] buildPrimaryTermsArray() { long[] terms = new long[numberOfShards]; for (Map.Entry<Integer, Long> entry : primaryTermsMap.entrySet()) { if (entry.getKey() < numberOfShards) { terms[entry.getKey()] = entry.getValue(); } } return terms; } Missing splitShardsMetadata in old-version read In `readFrom`, when reading from a pre-V_3_6_0 stream, `splitShardsMetadata` is never set on the builder. The `build()` method will create a default `SplitShardsMetadata` with `numberOfShards` root shards, which may be incorrect if the index had undergone splits before the upgrade. This could cause loss of split tree information during deserialization from older nodes. } else { assert primaryTerms != null; Map<Integer, Long> primaryTermsMapFromArray = new HashMap<>(); for (int shardId = 0; shardId < primaryTerms.length; shardId++) { primaryTermsMapFromArray.put(shardId, primaryTerms[shardId]); } builder.primaryTermsMap(primaryTermsMapFromArray); } Stale comment In `test_getActiveShardIterator_splitCompleted`, the comment says "Split in progress, should return all 4 shards" but the test is for a completed split returning 5 shards (1, 2, 3, 4, 5). This is misleading and should be corrected. // Split in progress, should return all 4 shards assertEquals(expectedActiveShardsForConsecutiveShardSplit, actualActiveShardsForConsecutiveShardSplit);

github-actions · 2026-03-13T11:20:47Z

PR Code Suggestions ✨

Latest suggestions up to 8d02fc7

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Guard against null array from missing XContent field If `numberOfRootShards` is still -1 when `parse()` completes (e.g., the field is missing from the XContent), `rootShardsToAllChildren` will remain `null`, causing a NullPointerException when the object is used. Add a null-check or validation before constructing the object. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648] +if (numberOfRootShards < 0) { + throw new IllegalArgumentException("Missing required field: " + KEY_NUMBER_OF_ROOT_SHARDS); +} +if (rootShardsToAllChildren == null) { + rootShardsToAllChildren = new ShardRange[numberOfRootShards][]; +} return new SplitShardsMetadata( rootShardsToAllChildren, tempShardIdToChildShards, inProgressSplitShardIds, activeShardIds, maxShardId ); Suggestion importance[1-10]: 6 __ Why: If `numberOfRootShards` is -1 or `rootShardsToAllChildren` is null due to missing XContent fields, a NullPointerException would occur. The suggestion adds a reasonable guard, though in practice the parse method should always encounter these fields in valid serialized data.	Low
	Fix test setup that throws before assertion The `builder.startObject()` call before the lambda may itself throw an `IOException` since it writes to the failing stream, causing the test to fail before reaching `expectThrows`. The `startObject()` call should be inside the `expectThrows` lambda, or the failing stream should only start failing after the initial setup writes. server/src/test/java/org/opensearch/cluster/metadata/ShardRangeTests.java [247-277] public void testToXContentIOException() throws IOException { ShardRange shardRange = new ShardRange(1, 0, 100); - ... + + OutputStream failingStream = new OutputStream() { + private boolean started = false; + + @Override + public void write(int b) throws IOException { + if (started) throw new IOException("Simulated IOException"); + } + + @Override + public void write(byte[] b) throws IOException { + if (started) throw new IOException("Simulated IOException"); + } + + @Override + public void write(byte[] b, int off, int len) throws IOException { + if (started) throw new IOException("Simulated IOException"); + } + }; + XContentBuilder builder = XContentFactory.jsonBuilder(failingStream); - builder.startObject(); // Start the JSON object + builder.startObject(); - // The actual test - this should throw IOException when trying to write expectThrows(IOException.class, () -> { shardRange.toXContent(builder, null); - builder.endObject(); // End the JSON object - builder.close(); // Force flush/close which will trigger the write + builder.endObject(); + builder.close(); }); } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies that `builder.startObject()` writes to the failing stream before `expectThrows`, potentially causing the test to fail prematurely. The proposed fix using a `started` flag is a valid approach, though the `improved_code` doesn't set `started = true` anywhere, making it incomplete. Still, the underlying issue identified is real and worth addressing.	Low
	Guard against null splitShardsMetadata in diff constructor If `after.splitShardsMetadata` or `before.splitShardsMetadata` is null, this line will throw a NullPointerException. Although `build()` initializes `splitShardsMetadata` when null, it's safer to add null guards here since the diff constructor is called directly on existing metadata objects. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641] -splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) { + splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +} else { + splitMetadata = null; +} Suggestion importance[1-10]: 5 __ Why: While `build()` initializes `splitShardsMetadata` when null, the diff constructor operates on existing `IndexMetadata` objects that could theoretically have null `splitShardsMetadata`. However, since `build()` always initializes it, this is a low-risk scenario in practice.	Low
	Prevent shard ID collision when switching from holes to sequential IDs When holes are exhausted mid-loop, the code switches to incrementing `nextChildShardId` from `maxShardId`, but `nextChildShardId` starts at `maxShardId` and is only updated when holes are used. If holes run out after some iterations, `nextChildShardId` may still be at `maxShardId` and the first increment produces `maxShardId + 1`, which could collide with already-assigned child shard IDs from the hole-filling phase. The `nextChildShardId` should be initialized to track the highest assigned ID correctly across both paths. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [365-378] int nextChildShardId = maxShardId, childShardId; for (int i = 0; i < numberOfChildren; ++i) { if (shardIdHoles.isEmpty()) { nextChildShardId++; - while (inProgressChildShardIds.contains(nextChildShardId)) { - assert activeShardIds.contains(nextChildShardId) == false; + while (inProgressChildShardIds.contains(nextChildShardId) \|\| activeShardIds.contains(nextChildShardId)) { nextChildShardId++; } childShardId = nextChildShardId; } else { childShardId = shardIdHoles.iterator().next(); shardIdHoles.remove(childShardId); nextChildShardId = Math.max(nextChildShardId, childShardId); } Suggestion importance[1-10]: 5 __ Why: The suggestion adds `activeShardIds.contains(nextChildShardId)` to the skip condition, but the existing code already has `assert !activeShardIds.contains(childShardId)` after assignment. The collision concern about holes running out mid-loop is partially mitigated by `nextChildShardId = Math.max(nextChildShardId, childShardId)` tracking the highest hole used, though adding the `activeShardIds` check is a minor defensive improvement.	Low
	Ensure consistent serialization for negative hash values The `shardId` is written with `writeVInt` but `start` and `end` are written with `writeInt`. Since hash ranges can be negative (as demonstrated in the test `testContainsWithNegativeHash`), using `writeVInt` for `start` and `end` would be incorrect for negative values. However, `shardId` should also be consistent — verify that `shardId` is always non-negative and that `start`/`end` are always non-negative before using `writeVInt`, or switch all fields to `writeInt`/`readInt` for correctness with negative values. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37] public ShardRange(StreamInput in) throws IOException { this(in.readVInt(), in.readInt(), in.readInt()); } +@Override +public void writeTo(StreamOutput out) throws IOException { + out.writeVInt(shardId); + out.writeInt(start); + out.writeInt(end); +} + Suggestion importance[1-10]: 5 __ Why: The suggestion correctly identifies that `shardId` uses `writeVInt`/`readVInt` while `start`/`end` use `writeInt`/`readInt`. Since `shardId` is always non-negative, `writeVInt` is appropriate for it. The `start`/`end` fields already use `writeInt`/`readInt` which handles negative values correctly. The suggestion's `improved_code` doesn't actually change anything from the existing code, making it a verification suggestion rather than a fix.	Low
General	Preserve existing split metadata when diff is null When `splitMetadata` is null (deserialized from a pre-V_3_6_0 node), the builder's `splitShardsMetadata` field is never set, so `build()` will create a default one based on `numberOfShards`. However, `part.splitShardsMetadata` may contain valid data that should be preserved. Consider falling back to `part.splitShardsMetadata` when `splitMetadata` is null. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 6 __ Why: When `splitMetadata` is null (pre-V_3_6_0 deserialization), the existing `part.splitShardsMetadata` data is silently discarded and replaced with a default. Falling back to `part.splitShardsMetadata` preserves existing state correctly during mixed-version cluster upgrades.	Low
General	Use consistent index reference for ShardId construction The method accepts a `shardIndex` parameter of type `Index` but uses it directly to construct a `ShardId`, bypassing the actual index metadata lookup. This could result in a mismatch if `shardIndex` doesn't correspond to the index named `index`. The `Index` object used should be derived from the resolved `indexMetadata` to ensure consistency. server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505] public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) { - int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true); - return new ShardId(shardIndex, shardId); + IndexMetadata indexMetadata = indexMetadata(clusterState, index); + int shardId = generateShardId(indexMetadata, id, routing, true); + return new ShardId(indexMetadata.getIndex(), shardId); } Suggestion importance[1-10]: 6 __ Why: The suggestion correctly identifies a potential inconsistency where `shardIndex` (passed as parameter) may not match the `indexMetadata` resolved from `index`. Using `indexMetadata.getIndex()` ensures the `ShardId` is constructed with the correct `Index` object, avoiding subtle bugs if the caller passes a mismatched `shardIndex`.	Low

Previous suggestions

Suggestions up to commit 49efa6a

Category	Suggestion	Impact
Possible issue	Preserve existing split metadata when diff is absent When `splitMetadata` is null (for pre-V_3_6_0 nodes), the builder's `splitShardsMetadata` is never set, so it will be initialized to a default value in `build()` based on `numberOfShards`. However, `part.splitShardsMetadata` may contain valid data that should be preserved. The existing split metadata from `part` should be carried over when no diff is available. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 7 __ Why: When `splitMetadata` is null (pre-V_3_6_0 nodes), the existing `splitShardsMetadata` from `part` is not preserved, which could cause data loss. The fix correctly carries over the existing metadata when no diff is available.	Medium
	Fix inconsistent integer serialization for shardId field The `shardId` is written with `writeVInt` but `start` and `end` are written with `writeInt`. Since hash ranges can be negative (as demonstrated in the test `testContainsWithNegativeHash`), using `writeVInt` for `start` and `end` would be incorrect. However, `shardId` should also consistently use `writeInt`/`readInt` unless negative values are explicitly excluded. More critically, `start` and `end` should use `readInt` to match `writeInt`, which is already correct — but verify that `shardId` cannot be negative before using `readVInt`. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37] public ShardRange(StreamInput in) throws IOException { - this(in.readVInt(), in.readInt(), in.readInt()); + this(in.readInt(), in.readInt(), in.readInt()); } Suggestion importance[1-10]: 7 __ Why: The `shardId` is read with `readVInt` in the constructor but written with `writeVInt` in `writeTo`. While shard IDs are typically non-negative, using `readVInt`/`writeVInt` is inconsistent with `start`/`end` which use `readInt`/`writeInt`. This is a valid concern about serialization consistency, though shard IDs in practice are non-negative.	Medium
	Use consistent integer serialization method for all fields `shardId` is serialized with `writeVInt` which only supports non-negative values, but `start` and `end` use `writeInt`. If `shardId` can ever be -1 (as used in the `parse` method as a default/sentinel value), `writeVInt` will throw or produce incorrect results. Use `writeInt` consistently for all three fields to match the `readInt` calls and support the full int range. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58] @Override public void writeTo(StreamOutput out) throws IOException { - out.writeVInt(shardId); + out.writeInt(shardId); out.writeInt(start); out.writeInt(end); } Suggestion importance[1-10]: 7 __ Why: The `writeTo` method uses `writeVInt` for `shardId` but `writeInt` for `start` and `end`, creating an inconsistency. Since `shardId` is read back with `readVInt` in the constructor, the round-trip is technically consistent, but using `writeInt`/`readInt` uniformly would be cleaner and avoid potential issues if `shardId` ever takes a negative sentinel value like `-1`.	Medium
	Guard against null array from missing parsed field If `numberOfRootShards` is -1 (i.e., the `KEY_NUMBER_OF_ROOT_SHARDS` field was not found in the parsed content), `rootShardsToAllChildren` will remain `null`, causing a NullPointerException when passed to the constructor. Add a null-check or validation before constructing the object. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648] +if (numberOfRootShards == -1 \|\| rootShardsToAllChildren == null) { + rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][]; +} return new SplitShardsMetadata( rootShardsToAllChildren, tempShardIdToChildShards, inProgressSplitShardIds, activeShardIds, maxShardId ); Suggestion importance[1-10]: 6 __ Why: If `numberOfRootShards` is -1 or `rootShardsToAllChildren` is null due to missing fields in the parsed content, a NullPointerException would occur. The suggestion adds a valid guard, though in practice well-formed data should always include these fields.	Low
	Guard against null pointer in diff computation If `after.splitShardsMetadata` or `before.splitShardsMetadata` is null (e.g., when constructing a diff involving older-format metadata), this will throw a NullPointerException. Add null guards before calling `diff`. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641] -splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) { + splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +} else { + splitMetadata = null; +} Suggestion importance[1-10]: 5 __ Why: While the `build()` method initializes `splitShardsMetadata` to a default value when null, there could be edge cases where it remains null. However, the `build()` method in `IndexMetadata.Builder` already ensures `splitShardsMetadata` is never null, making this a lower-priority concern.	Low
General	Validate split-in-progress before proceeding with update `findRootAndShard` can throw an `IllegalArgumentException` if `sourceShardId` is not found, but the method is supposed to validate via assertions. More critically, if `sourceShardId` is not a root shard and not found in `rootShardsToAllChildren`, the exception message says "Shard ID doesn't exist" rather than indicating the split is not in progress. The `findRootAndShard` result `shardRangeTuple` should be validated before use to avoid a potential NPE if the method is called with an invalid `sourceShardId` when assertions are disabled. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [428-431] public void updateSplitMetadataForChildShards(int sourceShardId, Set<Integer> newChildShardIds) { + if (!inProgressSplitShardIds.contains(sourceShardId)) { + throw new IllegalArgumentException("No split in progress for shard [" + sourceShardId + "]"); + } Tuple<Integer, ShardRange> shardRangeTuple = findRootAndShard(sourceShardId, rootShardsToAllChildren); assert inProgressSplitShardIds.contains(sourceShardId); Suggestion importance[1-10]: 5 __ Why: The current code uses an `assert` to check if the split is in progress, which is disabled in production. Moving this to an explicit check with a proper exception improves robustness, though the existing test `testUpdateSplitMetadataWithInvalidSourceShardId` already expects an `IllegalArgumentException` from `findRootAndShard`.	Low
General	Use actual index metadata instead of caller-provided index The method accepts `shardIndex` as an `Index` parameter but uses it directly in `new` `ShardId(shardIndex, shardId)`. This means the returned `ShardId` uses the provided `shardIndex` rather than the actual index from `indexMetadata`, which could lead to mismatches if the caller passes an incorrect `Index`. Consider deriving the index from `indexMetadata` for correctness. server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505] public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) { - int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true); - return new ShardId(shardIndex, shardId); + IndexMetadata indexMetadata = indexMetadata(clusterState, index); + int shardId = generateShardId(indexMetadata, id, routing, true); + return new ShardId(indexMetadata.getIndex(), shardId); } Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid concern about potential index mismatch, but the method signature accepting `shardIndex` as a parameter suggests it's intentionally designed to allow the caller to specify the index. Without more context on the caller's intent, this change may not be appropriate.	Low

Suggestions up to commit 9a4e365

Category	Suggestion	Impact
Possible issue	Guard against null array in parsed metadata If `numberOfRootShards` is never parsed (remains -1) or `rootShardsToAllChildren` is never assigned (remains null), the constructor will receive a null array, causing NullPointerExceptions later. Add a null/validity check before constructing the object, initializing `rootShardsToAllChildren` to an empty array when `numberOfRootShards` is 0 or when it was not found in the parsed content. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648] +if (rootShardsToAllChildren == null) { + rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][]; +} return new SplitShardsMetadata( rootShardsToAllChildren, tempShardIdToChildShards, inProgressSplitShardIds, activeShardIds, maxShardId ); Suggestion importance[1-10]: 7 __ Why: If `numberOfRootShards` is -1 or `rootShardsToAllChildren` is never assigned during parsing (e.g., missing field), passing `null` to the constructor would cause `NullPointerException` later. The fix correctly initializes the array as a fallback, improving robustness.	Medium
	Fix inconsistent serialization of shardId field The `shardId` is written with `writeVInt` but `start` and `end` are written with `writeInt`. Since `start` and `end` can be negative (as tested in `testContainsWithNegativeHash`), using `writeVInt` for them would be incorrect, but using `writeInt` for `shardId` would also be inconsistent. Ensure the read methods match the write methods exactly — if `shardId` can be negative or large, consider using `writeInt`/`readInt` consistently for all three fields. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37] public ShardRange(StreamInput in) throws IOException { - this(in.readVInt(), in.readInt(), in.readInt()); + this(in.readInt(), in.readInt(), in.readInt()); } Suggestion importance[1-10]: 7 __ Why: The `shardId` is read with `readVInt` but written with `writeVInt`, which only handles non-negative values. Since `shardId` can be `-1` (as seen in the `parse` method's default value), this is a real serialization bug. The fix to use `readInt`/`writeInt` consistently is valid and important.	Medium
	Use consistent integer serialization for all fields `writeVInt` only handles non-negative integers correctly; if `shardId` could ever be negative (e.g., the default `-1` used in `parse`), serialization would be corrupted. Use `writeInt` for `shardId` to be consistent with `start` and `end`, and to safely handle any value. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58] @Override public void writeTo(StreamOutput out) throws IOException { - out.writeVInt(shardId); + out.writeInt(shardId); out.writeInt(start); out.writeInt(end); } Suggestion importance[1-10]: 7 __ Why: Using `writeVInt` for `shardId` is problematic since `shardId` can be `-1` (the default in `parse`), and `writeVInt` doesn't handle negative values correctly. Switching to `writeInt` for consistency and correctness is a valid fix that addresses a real serialization bug.	Medium
	Preserve split metadata when diff is absent When `splitMetadata` is null (for pre-V_3_6_0 nodes), the builder's `splitShardsMetadata` field is never set, so it will be null and then initialized to a default in `build()`. This means the applied metadata loses the original `splitShardsMetadata` from `part`. The original value from `part` should be preserved when no diff is available. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 6 __ Why: When `splitMetadata` is null (pre-V_3_6_0 nodes), the builder's `splitShardsMetadata` is left unset and will be re-initialized to a default in `build()`, losing the original value from `part`. Preserving `part.splitShardsMetadata` is more correct for backward compatibility.	Low
	Null-guard split metadata diff computation If `after.splitShardsMetadata` or `before.splitShardsMetadata` is null (e.g., when dealing with older metadata that hasn't been upgraded yet), this will throw a NullPointerException. Add null checks before computing the diff. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641] -splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +if (after.splitShardsMetadata != null && before.splitShardsMetadata != null) { + splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +} else { + splitMetadata = null; +} Suggestion importance[1-10]: 5 __ Why: If either `after.splitShardsMetadata` or `before.splitShardsMetadata` is null, calling `.diff()` will throw a `NullPointerException`. However, the `build()` method initializes `splitShardsMetadata` to a default when null, so this scenario may be unlikely in practice.	Low
General	Replace assertion with explicit validation check `findRootAndShard` can throw `IllegalArgumentException` if the shard is not found, but the result `shardRangeTuple` is used without a null check. More critically, if `sourceShardId` is not in `inProgressSplitShardIds`, the assertion will only catch this in debug mode; in production this will silently corrupt state. The method should throw an `IllegalArgumentException` instead of relying on assertions for validation. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [428-431] public void updateSplitMetadataForChildShards(int sourceShardId, Set<Integer> newChildShardIds) { + if (!inProgressSplitShardIds.contains(sourceShardId)) { + throw new IllegalArgumentException("No split in progress for shard [" + sourceShardId + "]"); + } Tuple<Integer, ShardRange> shardRangeTuple = findRootAndShard(sourceShardId, rootShardsToAllChildren); + assert newChildShardIds.size() == parentToChildShards.get(sourceShardId).length; - assert inProgressSplitShardIds.contains(sourceShardId); - Suggestion importance[1-10]: 5 __ Why: Using `assert` for validation of `inProgressSplitShardIds.contains(sourceShardId)` only works in debug mode; in production this silently passes and can corrupt state. Replacing with an explicit `IllegalArgumentException` is safer, though the existing tests already cover this via `AssertionError`.	Low
General	Ensure ShardId uses consistent index reference The method accepts a `shardIndex` parameter of type `Index` but uses it directly to construct a `ShardId`, bypassing the actual index metadata lookup. If `shardIndex` doesn't correspond to the `index` string parameter, the returned `ShardId` could reference the wrong index. Consider deriving the `Index` from the resolved `IndexMetadata` to ensure consistency. server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505] -public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) { - int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true); - return new ShardId(shardIndex, shardId); +public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing) { + IndexMetadata indexMetadata = indexMetadata(clusterState, index); + int shardId = generateShardId(indexMetadata, id, routing, true); + return new ShardId(indexMetadata.getIndex(), shardId); } Suggestion importance[1-10]: 4 __ Why: The concern about `shardIndex` potentially not matching the `index` string is valid, but the method signature accepting `Index shardIndex` may be intentional (e.g., for cases where the caller needs to specify a particular index object). The suggestion changes the method signature which could break callers, and the actual use case context is unclear from the diff alone.	Low

Suggestions up to commit 4618229

Category	Suggestion	Impact
Possible issue	Preserve split metadata when diff is absent When `splitMetadata` is `null` (the old-version deserialization path), the `apply` method silently skips updating `splitShardsMetadata`, so the resulting `IndexMetadata` will have whatever default the builder produces rather than the value from `part`. This means the split metadata from the existing state is lost during diff application for pre-V_3_6_0 nodes. You should copy it from `part` when `splitMetadata` is null. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 7 __ Why: When `splitMetadata` is null (pre-V_3_6_0 deserialization path), the `apply` method skips updating `splitShardsMetadata`, causing the split metadata from the existing state (`part`) to be lost. The fix correctly preserves it by copying from `part`, preventing potential data loss during diff application.	Medium
	Guard against null array after XContent parsing If `numberOfRootShards` is still `-1` when `parse()` completes (e.g., the field was missing or appeared after `KEY_ROOT_SHARDS_TO_ALL_CHILDREN`), `rootShardsToAllChildren` will be `null`, causing a `NullPointerException` when the constructor or any caller accesses it. Add a null-check or validation before constructing the object. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648] +if (rootShardsToAllChildren == null) { + rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][]; +} return new SplitShardsMetadata( rootShardsToAllChildren, tempShardIdToChildShards, inProgressSplitShardIds, activeShardIds, maxShardId ); Suggestion importance[1-10]: 6 __ Why: If `numberOfRootShards` is -1 or `rootShardsToAllChildren` remains null due to field ordering or missing data, a `NullPointerException` would occur. The fix is reasonable but the scenario may be unlikely in practice since the parser processes fields sequentially and the format is controlled.	Low
	Prevent duplicate shard ID assignment during split When holes are exhausted mid-loop, the code switches to incrementing `nextChildShardId` from `maxShardId`, but `nextChildShardId` was only updated via `Math.max` when consuming holes. If the last hole ID is less than `maxShardId`, the next increment starts from `maxShardId + 1`, which is correct, but if a hole ID exceeds `maxShardId`, subsequent non-hole IDs could collide with already-consumed hole IDs. Additionally, the `allConsumedShardIds` list does not include the newly assigned child shard IDs within the same loop iteration, so duplicate IDs can be assigned across iterations when holes run out. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [365-378] int nextChildShardId = maxShardId, childShardId; for (int i = 0; i < numberOfChildren; ++i) { if (shardIdHoles.isEmpty()) { - nextChildShardId++; - while (inProgressChildShardIds.contains(nextChildShardId)) { - assert activeShardIds.contains(nextChildShardId) == false; + do { nextChildShardId++; - } + } while (inProgressChildShardIds.contains(nextChildShardId) \|\| activeShardIds.contains(nextChildShardId)); childShardId = nextChildShardId; } else { childShardId = shardIdHoles.iterator().next(); shardIdHoles.remove(childShardId); nextChildShardId = Math.max(nextChildShardId, childShardId); } Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern about potential duplicate shard ID assignment when holes are exhausted mid-loop, but the existing code already tracks `inProgressChildShardIds` and uses assertions to catch duplicates. The improved code adds `activeShardIds` check to the loop condition, which is a minor defensive improvement.	Low
	Prevent index mismatch in shard ID construction The method accepts a `shardIndex` parameter of type `Index` but uses it directly to construct the `ShardId`, rather than using the index from `indexMetadata`. This could lead to a mismatch if `shardIndex` doesn't correspond to the same index as `index` string. Consider deriving the `Index` from `indexMetadata` to ensure consistency. server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505] -public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) { - int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true); - return new ShardId(shardIndex, shardId); +public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing) { + IndexMetadata indexMetadata = indexMetadata(clusterState, index); + int shardId = generateShardId(indexMetadata, id, routing, true); + return new ShardId(indexMetadata.getIndex(), shardId); } Suggestion importance[1-10]: 5 __ Why: The `shardIndex` parameter could potentially differ from the index resolved via `indexMetadata`, leading to a mismatch in the returned `ShardId`. Deriving the `Index` from `indexMetadata` would be safer, though the caller may intentionally pass a specific `shardIndex` for the recovering child scenario.	Low
	Consistent integer serialization for all fields The `shardId` is written with `writeVInt` but `start` and `end` are written with `writeInt`. Since hash ranges can be negative (as evidenced by the test `testContainsWithNegativeHash` using `-10` to `10`), using `writeVInt` for `shardId` is fine, but `start` and `end` should consistently use `writeInt` for reading. However, `shardId` should also be consistent — if it can only be non-negative, `readVInt` is correct. The real issue is that `start` could be negative, so ensure `writeInt`/`readInt` is used for both `start` and `end`, which it is. But `shardId` uses `writeVInt`/`readVInt` — verify this is intentional and that shard IDs are always non-negative. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37] public ShardRange(StreamInput in) throws IOException { - this(in.readVInt(), in.readInt(), in.readInt()); + this(in.readInt(), in.readInt(), in.readInt()); } +@Override +public void writeTo(StreamOutput out) throws IOException { + out.writeInt(shardId); + out.writeInt(start); + out.writeInt(end); +} + Suggestion importance[1-10]: 4 __ Why: Shard IDs are always non-negative integers, so `writeVInt`/`readVInt` is a valid and more space-efficient choice. The suggestion to change to `readInt`/`writeInt` for `shardId` is a style preference rather than a bug fix. The `start` and `end` fields already correctly use `writeInt`/`readInt` to handle negative values.	Low
General	Prevent silent data loss in backward-compatible serialization The `primaryTermsMap` in `IndexMetadataDiff` may contain shard IDs beyond the original `numberOfShards` (e.g., after a split), but the old-format array is sized by `INDEX_NUMBER_OF_SHARDS_SETTING`. Entries with keys ≥ `numShards` are silently dropped, which can cause data loss or incorrect primary terms on the receiving pre-V_3_6_0 node. This backward-compatibility path should either reject the serialization or document that split-extended shards cannot be sent to old nodes. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1712-1721] if (out.getVersion().before(Version.V_3_6_0)) { int numShards = INDEX_NUMBER_OF_SHARDS_SETTING.get(settings); + for (Integer key : primaryTermsMap.keySet()) { + if (key >= numShards) { + throw new IllegalStateException( + "Cannot serialize primaryTermsMap with shard id [" + key + "] to node with version before V_3_6_0" + ); + } + } long[] primaryTermsArray = new long[numShards]; for (Map.Entry<Integer, Long> entry : primaryTermsMap.entrySet()) { - if (entry.getKey() < numShards) { - primaryTermsArray[entry.getKey()] = entry.getValue(); - } + primaryTermsArray[entry.getKey()] = entry.getValue(); } out.writeVLongArray(primaryTermsArray); } Suggestion importance[1-10]: 5 __ Why: The concern about silently dropping shard IDs beyond `numShards` during backward-compatible serialization is valid, but throwing an exception may break rolling upgrades. The suggestion correctly identifies the issue but the proposed fix (throwing an exception) could be too disruptive; documenting the limitation or logging a warning might be more appropriate.	Low
General	Add validation after parsing shard range values The parser only handles `VALUE_NUMBER` tokens, but unknown field types or unexpected tokens are silently ignored without any warning or error. Additionally, there is no validation that the parsed values are valid (e.g., `start <= end`, non-negative `shardId`). Consider adding validation after parsing to catch malformed data early. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [74-82] } else if (token == XContentParser.Token.VALUE_NUMBER) { if ("shard_id".equals(fieldName)) { shardId = parser.intValue(); } else if ("start".equals(fieldName)) { start = parser.intValue(); } else if ("end".equals(fieldName)) { end = parser.intValue(); } +} else if (token != XContentParser.Token.VALUE_NULL) { + parser.skipChildren(); +} +// After loop: +if (shardId != -1 && start > end) { + throw new IllegalArgumentException("ShardRange start=" + start + " must be <= end=" + end); } Suggestion importance[1-10]: 3 __ Why: Adding validation for parsed values is a good defensive practice, but the `improved_code` mixes the loop body change with post-loop validation in a confusing way. The suggestion is a minor improvement to robustness rather than fixing a critical bug.	Low

Suggestions up to commit ec0bf77

Category	Suggestion	Impact
Possible issue	Guard against null array on parse completion If `numberOfRootShards` is still `-1` when `parse()` completes (e.g., the field was missing or appeared after `KEY_ROOT_SHARDS_TO_ALL_CHILDREN`), `rootShardsToAllChildren` will remain `null`, causing a `NullPointerException` when the object is used. Add a null-check or initialize `rootShardsToAllChildren` to an empty array as a fallback before constructing the object. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [641-648] +if (rootShardsToAllChildren == null) { + rootShardsToAllChildren = new ShardRange[Math.max(numberOfRootShards, 0)][]; +} return new SplitShardsMetadata( rootShardsToAllChildren, tempShardIdToChildShards, inProgressSplitShardIds, activeShardIds, maxShardId ); Suggestion importance[1-10]: 7 __ Why: If `rootShardsToAllChildren` is `null` when `parse()` completes (e.g., missing field or ordering issue), a `NullPointerException` will occur when the object is used. Adding a null-check before constructing the object is a valid defensive measure.	Medium
	Preserve existing split metadata when diff is null When `splitMetadata` is `null` (the pre-V_3_6_0 deserialization path), the applied `IndexMetadata` will have no `splitShardsMetadata` set, so `build()` will create a brand-new default one instead of preserving the one from `part`. This silently discards the existing split metadata on the part being patched. The existing metadata from `part` should be carried over when `splitMetadata` is null. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 7 __ Why: When `splitMetadata` is `null` (pre-V_3_6_0 path), the `build()` method creates a new default `splitShardsMetadata` instead of preserving the one from `part`, silently discarding existing split metadata. Carrying over `part.splitShardsMetadata` in the else branch is a valid fix.	Medium
	Fix potential negative value serialization issue `shardId` is serialized with `writeVInt` but deserialized with `readVInt`. While `writeVInt` only supports non-negative values, if `shardId` can ever be -1 (as used in the default/empty parse case), this would cause incorrect serialization. Use `writeInt`/`readInt` for `shardId` to be safe and consistent with `start` and `end`. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [54-58] @Override public void writeTo(StreamOutput out) throws IOException { - out.writeVInt(shardId); + out.writeInt(shardId); out.writeInt(start); out.writeInt(end); } Suggestion importance[1-10]: 6 __ Why: The `parse` method uses `-1` as a default for `shardId`, and if a `ShardRange` with `shardId=-1` is serialized using `writeVInt`, it would cause incorrect behavior since `writeVInt` doesn't support negative values. Using `writeInt` for `shardId` would be safer and consistent with `start` and `end`.	Low
	Null-check before computing split metadata diff If `before.splitShardsMetadata` is `null` (e.g., when upgrading from a pre-V_3_6_0 node that never set it), calling `diff` on it will throw a `NullPointerException`. A null-check should be added before computing the diff. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1641] -splitMetadata = after.splitShardsMetadata.diff(before.splitShardsMetadata); +splitMetadata = (before.splitShardsMetadata != null) + ? after.splitShardsMetadata.diff(before.splitShardsMetadata) + : after.splitShardsMetadata.diff(new SplitShardsMetadata.Builder(after.numberOfShards).build()); Suggestion importance[1-10]: 5 __ Why: If `before.splitShardsMetadata` is `null` during an upgrade scenario, calling `diff` would throw a `NullPointerException`. However, the `build()` method already initializes `splitShardsMetadata` to a default value when null, so this scenario may be unlikely in practice.	Low
	Consistent integer serialization for all fields The `shardId` is written with `writeVInt` but `start` and `end` are written with `writeInt`. Since hash ranges can be negative (as evidenced by the test `testContainsWithNegativeHash`), using `writeVInt` for `shardId` is fine, but `start` and `end` should consistently use `readInt`/`writeInt`. However, `shardId` should also not use `writeVInt` if it could ever be negative. Verify that `shardId` is always non-negative; if so, `writeVInt` is acceptable, but consider using `writeInt` for consistency and safety. server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java [35-37] public ShardRange(StreamInput in) throws IOException { - this(in.readVInt(), in.readInt(), in.readInt()); + this(in.readInt(), in.readInt(), in.readInt()); } Suggestion importance[1-10]: 5 __ Why: The suggestion raises a valid concern about consistency between `writeVInt`/`readVInt` for `shardId` vs `writeInt`/`readInt` for `start` and `end`. Since `shardId` is typically non-negative, `writeVInt` is functionally correct, but using `writeInt` for all fields would be more consistent. The suggestion is valid but has low-to-moderate impact.	Low
General	Recalculate max shard ID after cancel When a split is cancelled, the temporary child shard IDs that were allocated during `splitShard` are removed from `parentToChildShards`, but `maxShardId` is never rolled back. If those IDs were newly allocated (beyond the previous `maxShardId`), subsequent splits will skip those IDs rather than reusing them, causing gaps. The `findHoles` logic partially compensates, but `maxShardId` itself should be recalculated or the cancelled children's IDs should be tracked so they can be reused. server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java [471-475] public void cancelSplit(int sourceShardId) { assert inProgressSplitShardIds.contains(sourceShardId); inProgressSplitShardIds.remove(sourceShardId); parentToChildShards.remove(sourceShardId); + // Recalculate maxShardId since cancelled children may have been at the top + int newMax = rootShardsToAllChildren.length - 1; + for (Integer parent : parentToChildShards.keySet()) { + for (ShardRange child : parentToChildShards.get(parent)) { + newMax = Math.max(newMax, child.shardId()); + } + } + for (Integer active : activeShardIds) { + newMax = Math.max(newMax, active); + } + maxShardId = newMax; } Suggestion importance[1-10]: 5 __ Why: After cancelling a split, `maxShardId` is not rolled back, causing gaps in shard ID allocation. The `findHoles` logic partially compensates by reusing gaps, but recalculating `maxShardId` would be more correct. The suggested fix is logically sound and addresses a real gap in the implementation.	Low
General	Prevent index mismatch in shard ID construction The method accepts `shardIndex` as a parameter but uses it directly to construct the `ShardId`, while the `index` string parameter is used only to look up `IndexMetadata`. This could lead to a mismatch if `shardIndex` doesn't correspond to the same index as `index`. Consider deriving the `Index` from the resolved `IndexMetadata` to ensure consistency. server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java [502-505] public ShardId shardWithRecoveringChild(ClusterState clusterState, String index, String id, String routing, Index shardIndex) { - int shardId = generateShardId(indexMetadata(clusterState, index), id, routing, true); - return new ShardId(shardIndex, shardId); + IndexMetadata indexMetadata = indexMetadata(clusterState, index); + int shardId = generateShardId(indexMetadata, id, routing, true); + return new ShardId(indexMetadata.getIndex(), shardId); } Suggestion importance[1-10]: 5 __ Why: The suggestion identifies a potential inconsistency where `shardIndex` parameter might not correspond to the `index` string used for metadata lookup. Using `indexMetadata.getIndex()` would ensure consistency, though in practice callers likely pass matching values. The improved code is accurate and addresses a real design concern.	Low

Suggestions up to commit 2db9022

Category	Suggestion	Impact
Possible issue	Preserve existing split metadata when diff is absent When `splitMetadata` is null (deserialized from a pre-V_3_6_0 node), the builder's `splitShardsMetadata` field is never set, so `build()` will create a default one using `numberOfShards`. However, `part.splitShardsMetadata` may contain valid state that should be preserved. The diff should always be applied; if there is no new split metadata, the existing one from `part` should be carried forward. server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java [1760-1762] if (splitMetadata != null) { builder.splitShardsMetadata(splitMetadata.apply(part.splitShardsMetadata)); +} else { + builder.splitShardsMetadata(part.splitShardsMetadata); } Suggestion importance[1-10]: 7 __ Why: When `splitMetadata` is null (pre-V_3_6_0 deserialization), the existing `part.splitShardsMetadata` state is not preserved in the applied result. The `build()` method will create a default `SplitShardsMetadata` instead, potentially losing valid state from the base object.	Medium
	Prevent duplicate shard ID assignment during split When holes are exhausted mid-loop, the code switches to incrementing `nextChildShardId` from `maxShardId`, but `nextChildShardId` was only updated via `Math.max` when consuming holes. If holes were consumed and `nextChildShardId` is still `maxShardId`, the next increment may produce a shard ID that is already active or in-progress. The `while` loop skipping in-progress IDs does not also skip active IDs, which could lead to duplicate shard ID assignment. [server/src/main/java...

github-actions · 2026-03-13T11:29:41Z

❌ Gradle check result for a79f554: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-13T13:03:25Z

Persistent review updated to latest commit 276231f

github-actions · 2026-03-13T13:16:08Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 6f0508b.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/cluster/routing/OperationRouting.java	557	medium	The core document routing function now delegates final shard selection to SplitShardsMetadata.getShardIdOfHash(), replacing the previously deterministic hash-modulo calculation. If splitShardsMetadata could be externally manipulated (e.g., via cluster state poisoning), an attacker could route documents to arbitrary shards. This is consistent with the feature intent but represents a meaningful expansion of the cluster state attack surface for data misrouting. Warrants review of how splitShardsMetadata is authorized and validated before being applied to cluster state.
server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java	1157	low	The safety assertion was changed from `primaryTerms.length == numberOfShards` (array-based, always active) to `primaryTermsMap.size() == numberOfShards` (map-based). Java assertions are disabled by default in production JVMs unless -ea is explicitly passed. The old check on the array length would still apply during the same constructor call since primaryTerms is also set, but if the array path is later removed, this weakens a structural invariant check to one that may be silently skipped in production. Not clearly malicious, but worth noting as a regression in defensive validation.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 1 | Low: 1

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java

server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java

server/src/main/java/org/opensearch/cluster/metadata/ShardRange.java

github-actions · 2026-03-15T10:31:52Z

Persistent review updated to latest commit cc1269d

github-actions · 2026-03-15T10:59:41Z

Persistent review updated to latest commit f642557

vikasvb90 · 2026-03-15T11:02:10Z

Thanks @andrross for the review! Addressed your comments. Can you please take a look again?

github-actions · 2026-03-15T11:43:42Z

❌ Gradle check result for f642557: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-16T03:01:24Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit dd496ee.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/cluster/metadata/SplitShardsMetadata.java	151	low	binarySearchShards() can return null when no matching shard range is found, but multiple callers rely solely on 'assert shardRange != null' for safety. With JVM assertions disabled (default in production), this results in a NullPointerException during document routing, which could cause an availability/DoS condition for affected shards. Not malicious, but a latent crash path in production.
server/src/main/java/org/opensearch/cluster/metadata/IndexMetadata.java	1709	low	In IndexMetadataDiff.writeTo(), the backward-compatible serialization for pre-V_3_6_0 nodes silently drops primaryTermsMap entries where shard key >= numShards (child shards created by splits). This means nodes on the old wire protocol receive a truncated primary-terms array with zeros for split child shards, which could cause incorrect primary-term fencing decisions on mixed-version clusters. Appears intentional for compatibility, but the silent data loss is worth review.

The table above displays the top 10 most important findings.

Total: 2 | Critical: 0 | High: 0 | Medium: 0 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

github-actions · 2026-03-16T03:02:19Z

Persistent review updated to latest commit dd496ee

github-actions · 2026-03-16T03:18:57Z

Persistent review updated to latest commit 33769e5

github-actions · 2026-03-16T04:02:11Z

❌ Gradle check result for 33769e5: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-16T05:59:50Z

Persistent review updated to latest commit 2db9022

github-actions · 2026-03-16T07:07:27Z

❌ Gradle check result for 2db9022: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-16T08:54:43Z

Persistent review updated to latest commit ec0bf77

github-actions · 2026-03-16T09:09:21Z

Persistent review updated to latest commit 4618229

github-actions · 2026-03-16T09:13:49Z

❌ Gradle check result for 4618229: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-16T09:17:28Z

Persistent review updated to latest commit 9a4e365

github-actions · 2026-03-16T10:01:53Z

❌ Gradle check result for 9a4e365: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-16T23:50:30Z

Persistent review updated to latest commit 49efa6a

github-actions · 2026-03-17T00:55:42Z

❌ Gradle check result for 49efa6a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

… on child shard hash Signed-off-by: vikasvb90 <vikasvb@amazon.com>

github-actions · 2026-03-17T19:42:30Z

Persistent review updated to latest commit 8d02fc7

github-actions · 2026-03-17T20:59:17Z

✅ Gradle check result for 8d02fc7: SUCCESS

codecov · 2026-03-17T22:01:49Z

Codecov Report

❌ Patch coverage is 81.06509% with 96 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.31%. Comparing base (28fa177) to head (8d02fc7).
⚠️ Report is 7 commits behind head on main.

Files with missing lines	Patch %	Lines
...ensearch/cluster/metadata/SplitShardsMetadata.java	85.11%	26 Missing and 27 partials ⚠️
...org/opensearch/cluster/metadata/IndexMetadata.java	66.10%	21 Missing and 19 partials ⚠️
...g/opensearch/cluster/routing/OperationRouting.java	71.42%	2 Missing ⚠️
...va/org/opensearch/cluster/metadata/ShardRange.java	96.15%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main   #20859      +/-   ##
============================================
- Coverage     73.32%   73.31%   -0.02%     
- Complexity    72272    72410     +138     
============================================
  Files          5797     5804       +7     
  Lines        330323   330885     +562     
  Branches      47676    47822     +146     
============================================
+ Hits         242215   242576     +361     
- Misses        68663    68828     +165     
- Partials      19445    19481      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This reverts commit ed2e100. The change to IndexMetadata has resulted in bwc failures with the remote cluster state feature.

This reverts commit ed2e100. The change to IndexMetadata has resulted in bwc failures with the remote cluster state feature. Signed-off-by: Andrew Ross <andrross@amazon.com>

vikasvb90 requested a review from a team as a code owner March 13, 2026 11:18

vikasvb90 added skip-changelog ShardManagement:Routing Scale labels Mar 13, 2026

github-project-automation bot added this to Shard Management Project Board Mar 13, 2026

vikasvb90 mentioned this pull request Mar 13, 2026

Add Initial Routing and Metadata Groundwork for Virtual Shards #20745

Merged

vikasvb90 force-pushed the online_shard_split branch from a79f554 to 276231f Compare March 13, 2026 13:01

vikasvb90 force-pushed the online_shard_split branch from 276231f to 6f0508b Compare March 13, 2026 13:14

andrross reviewed Mar 13, 2026

View reviewed changes

vikasvb90 force-pushed the online_shard_split branch from 6f0508b to cc1269d Compare March 15, 2026 10:30

vikasvb90 force-pushed the online_shard_split branch from cc1269d to f642557 Compare March 15, 2026 10:57

vikasvb90 force-pushed the online_shard_split branch from f642557 to dd496ee Compare March 16, 2026 03:00

vikasvb90 force-pushed the online_shard_split branch from dd496ee to 33769e5 Compare March 16, 2026 03:16

vikasvb90 force-pushed the online_shard_split branch from 33769e5 to 2db9022 Compare March 16, 2026 05:58

vikasvb90 force-pushed the online_shard_split branch from 2db9022 to ec0bf77 Compare March 16, 2026 08:52

vikasvb90 force-pushed the online_shard_split branch from ec0bf77 to 4618229 Compare March 16, 2026 09:07

vikasvb90 force-pushed the online_shard_split branch from 4618229 to 9a4e365 Compare March 16, 2026 09:15

This was referenced Mar 16, 2026

[AUTOCUT] Gradle Check Flaky Test Report for ClientYamlTestSuiteIT #14319

Open

[AUTOCUT] Gradle Check Flaky Test Report for SearchRestCancellationIT #14311

Open

andrross force-pushed the online_shard_split branch from 9a4e365 to 49efa6a Compare March 16, 2026 23:48

andrross approved these changes Mar 16, 2026

View reviewed changes

github-project-automation bot moved this to 👀 In review in Shard Management Project Board Mar 16, 2026

This was referenced Mar 17, 2026

[AUTOCUT] Gradle Check Flaky Test Report for FullClusterRestartIT #15813

Open

[AUTOCUT] Gradle Check Flaky Test Report for FieldSortIT #14287

Open

Adding metadata to support shard split and to resolve shard ids based…

8d02fc7

… on child shard hash Signed-off-by: vikasvb90 <vikasvb@amazon.com>

andrross force-pushed the online_shard_split branch from 49efa6a to 8d02fc7 Compare March 17, 2026 19:40

andrross merged commit ed2e100 into opensearch-project:main Mar 17, 2026
33 checks passed

github-project-automation bot moved this from 👀 In review to ✅ Done in Shard Management Project Board Mar 17, 2026

opensearch-ci-bot mentioned this pull request Mar 18, 2026

[AUTOCUT] Gradle Check Flaky Test Report for GoogleCloudStorageBlobStoreRepositoryTests #19041

Open

Conversation

vikasvb90 commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 8d02fc7)

Uh oh!

github-actions bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

Uh oh!

github-actions bot commented Mar 13, 2026

PR Code Analyzer ❗

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

vikasvb90 commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 15, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

PR Code Analyzer ❗

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 16, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

Uh oh!

codecov bot commented Mar 17, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vikasvb90 commented Mar 13, 2026 •

edited

Loading

github-actions bot commented Mar 13, 2026 •

edited

Loading

(Review updated until commit `8d02fc7`)

github-actions bot commented Mar 13, 2026 •

edited

Loading