Cluster state and recovery constructs for in-place shard split by vikasvb90 · Pull Request #20979 · opensearch-project/OpenSearch

vikasvb90 · 2026-03-24T03:52:15Z

Description

Add cluster state infrastructure for in-place shard split

This PR adds the cluster state update service and supporting POJO changes needed to trigger an in-place shard split.

Changes:

Cluster state update service:

MetadataInPlaceShardSplitService: Submits an acked cluster state update task that updates SplitShardsMetadata on the target index and triggers a reroute. Validates that the index exists, the shard is not already splitting or already split, and virtual shards are not enabled on the index.
InPlaceShardSplitClusterStateUpdateRequest: Request POJO holding index name, shard ID, split-into count, and cause.
ClusterManagerTask: Added IN_PLACE_SHARD_SPLIT task key for cluster manager task throttling.

Routing POJO changes to support shard split lifecycle:

ShardRoutingState: Added SPLITTING state for parent shards undergoing an in-place split.
RecoverySource: Added InPlaceShardSplitRecoverySource type for child shards recovering from a parent shard on the same node.
UnassignedInfo: Added CHILD_SHARD_CREATED reason for child shards pending allocation after a split.
AllocationId: Added splitChildAllocationIds and parentAllocationId fields with version-gated serialization (V_3_6_0), factory methods (newSplit, newTargetSplit, cancelSplit, finishSplit), and updated equals/hashCode/toXContent.
ShardRouting: Added recoveringChildShards and parentShardId fields, new constructor accepting split fields, and query methods (splitting(), isSplitTarget(), getRecoveringChildShards(), getParentShardId()).

The REST API is not exposed in this PR. The routing allocation logic to actually assign child shards to nodes will follow in a subsequent PR.

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2026-03-24T03:53:43Z

PR Reviewer Guide 🔍

(Review updated until commit `8dbd619`)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes Sub-PR theme: Routing POJO changes for in-place shard split lifecycle Relevant files: server/src/main/java/org/opensearch/cluster/routing/AllocationId.java server/src/main/java/org/opensearch/cluster/routing/RecoverySource.java server/src/main/java/org/opensearch/cluster/routing/ShardRoutingState.java server/src/main/java/org/opensearch/cluster/routing/UnassignedInfo.java server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java server/src/test/java/org/opensearch/cluster/routing/AllocationIdSplitTests.java server/src/test/java/org/opensearch/cluster/routing/RecoverySourceSplitTests.java server/src/test/java/org/opensearch/cluster/routing/ShardRoutingStateSplitTests.java Sub-PR theme: Cluster state update service for in-place shard split Relevant files: server/src/main/java/org/opensearch/action/admin/indices/split/InPlaceShardSplitClusterStateUpdateRequest.java server/src/main/java/org/opensearch/action/admin/indices/split/package-info.java server/src/main/java/org/opensearch/cluster/metadata/MetadataInPlaceShardSplitService.java server/src/main/java/org/opensearch/cluster/service/ClusterManagerTask.java server/src/test/java/org/opensearch/cluster/metadata/MetadataInPlaceShardSplitServiceTests.java
⚡ Recommended focus areas for review Broken Assert The original assert `expectedShardSize >= 0 \|\| state != INITIALIZING \|\| state != RELOCATING` was always true (a value can't simultaneously be both states), making it a no-op guard. The new version `expectedShardSize >= 0 \|\| state != INITIALIZING && state != RELOCATING && state != SPLITTING` uses operator precedence where `&&` binds tighter than `\|\|`, so it reads as `expectedShardSize >= 0 \|\| (state != INITIALIZING && state != RELOCATING && state != SPLITTING)`. This means the assert still passes whenever `expectedShardSize < 0` but `state` is not one of those three — it never actually enforces that `expectedShardSize >= 0` when the state IS one of those three. The intended guard (reject negative shard size for INITIALIZING/RELOCATING/SPLITTING) is not enforced. assert expectedShardSize >= 0 \|\| state != ShardRoutingState.INITIALIZING && state != ShardRoutingState.RELOCATING && state != ShardRoutingState.SPLITTING : expectedShardSize + " state: " + state; Mutable Array Exposure `getRecoveringChildShards()` returns the raw `ShardRouting[]` array directly. Callers can mutate the array contents, breaking the immutability contract of `ShardRouting`. It should return a defensive copy or an unmodifiable view (e.g., `Arrays.copyOf` or expose as `List<ShardRouting>`). public ShardRouting[] getRecoveringChildShards() { return recoveringChildShards; } Serialization Version Gap The serialization of `splitChildAllocationIds` and `parentAllocationId` is gated on `Version.V_3_6_0`, but the `BytesStreamOutput` used in tests (`AllocationIdSplitTests`) has no version set, so it defaults to the current version. In a mixed-version cluster where a node on an older version receives an `AllocationId` written by a newer node, the reader will skip these fields and set them to `null`, silently losing split state. This is expected for BWC, but the comment in `ShardRouting`'s deserialization constructor says these fields are "transient - populated by RoutingNodes constructor, not serialized on the wire" — yet `AllocationId` does serialize them. This inconsistency could cause confusion and subtle bugs if the RoutingNodes reconstruction path doesn't account for the version-gated absence of these fields. if (in.getVersion().onOrAfter(Version.V_3_6_0)) { splitChildAllocationIds = in.readOptionalStringList(); parentAllocationId = in.readOptionalString(); } else { splitChildAllocationIds = null; parentAllocationId = null; } Missing Input Validation `applyShardSplitRequest` does not validate that `request.getSplitInto()` is greater than 1 (or at least greater than 0) before calling `splitMetadataBuilder.splitShard(shardId, request.getSplitInto())`. The test `testApplyShardSplitRequestThrowsForSplitIntoZero` expects an `ArithmeticException` to be thrown downstream, but relying on an arithmetic exception from a downstream call for input validation is fragile. A value of 1 (split into 1) would likely silently succeed but produce a meaningless operation. Explicit validation with a clear `IllegalArgumentException` should be added. int shardId = request.getShardId(); SplitShardsMetadata splitShardsMetadata = curIndexMetadata.getSplitShardsMetadata(); if (splitShardsMetadata.getInProgressSplitShardIds().contains(shardId)) { throw new IllegalArgumentException("Splitting of shard [" + shardId + "] is already in progress"); } if (splitShardsMetadata.isSplitParent(shardId)) { throw new IllegalArgumentException("Shard [" + shardId + "] has already been split."); } RoutingTable.Builder routingTableBuilder = RoutingTable.builder(currentState.routingTable()); Metadata.Builder metadataBuilder = Metadata.builder(currentState.metadata()); IndexMetadata.Builder indexMetadataBuilder = IndexMetadata.builder(curIndexMetadata); SplitShardsMetadata.Builder splitMetadataBuilder = new SplitShardsMetadata.Builder(splitShardsMetadata); splitMetadataBuilder.splitShard(shardId, request.getSplitInto());

github-actions · 2026-03-24T03:54:23Z

PR Code Suggestions ✨

Latest suggestions up to 8dbd619

Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Fix recovery source deserialization for SPLITTING state The `SPLITTING` state also requires a recovery source (it uses `InPlaceShardSplitRecoverySource`), but the deserialization logic only reads `recoverySource` for `UNASSIGNED` or `INITIALIZING` states. This means a shard in `SPLITTING` state will have a null recovery source after deserialization, which is inconsistent with the constructor assertion that recovery source is only available on unassigned or initializing shards. The `SPLITTING` state should be included in the condition for reading the recovery source, or the assertion and write logic should be updated to handle it consistently. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [435-439] -if (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING) { +if (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING \|\| state == ShardRoutingState.SPLITTING) { recoverySource = RecoverySource.readFrom(in); } else { recoverySource = null; } Suggestion importance[1-10]: 7 __ Why: The `SPLITTING` state likely requires a recovery source (`InPlaceShardSplitRecoverySource`), but the deserialization only reads it for `UNASSIGNED` or `INITIALIZING`. However, looking at the constructor assertion at line 153-154, it only checks for `UNASSIGNED` or `INITIALIZING` having a recovery source, suggesting `SPLITTING` may intentionally not use one in the current design. This is a valid concern but depends on the intended design.	Medium
	Fix recovery source serialization for SPLITTING state The `writeToThin` serialization does not write the recovery source for the `SPLITTING` state, which is inconsistent with the read path (once fixed). If a shard in `SPLITTING` state has a recovery source, it will not be serialized, causing a mismatch on deserialization. The `SPLITTING` state should be included here as well. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [474-476] -if (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING) { +if (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING \|\| state == ShardRoutingState.SPLITTING) { recoverySource.writeTo(out); } Suggestion importance[1-10]: 7 __ Why: The `writeToThin` method doesn't serialize the recovery source for `SPLITTING` state, which would be inconsistent if `SPLITTING` shards have a recovery source. This is a valid serialization consistency concern that pairs with suggestion 1.	Medium
	Fix assertion to include SPLITTING state with recovery source The assertion does not account for the new `SPLITTING` state, which also has a recovery source (`InPlaceShardSplitRecoverySource`). This will cause assertion failures when creating a shard in `SPLITTING` state with a recovery source. The `SPLITTING` state should be included in the condition. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [153-154] -assert (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING) == (recoverySource != null) - : "recovery source only available on unassigned or initializing shard but was " + state; +assert (state == ShardRoutingState.UNASSIGNED \|\| state == ShardRoutingState.INITIALIZING \|\| state == ShardRoutingState.SPLITTING) == (recoverySource != null) + : "recovery source only available on unassigned, initializing, or splitting shard but was " + state; Suggestion importance[1-10]: 6 __ Why: The assertion at line 153-154 doesn't account for `SPLITTING` state potentially having a recovery source. If `SPLITTING` shards use `InPlaceShardSplitRecoverySource`, this assertion would fail. This is a valid concern that could cause assertion errors at runtime.	Low
General	Prevent mutation of internal array via defensive copy Returning a mutable array directly exposes the internal state of `ShardRouting`, allowing callers to modify the array contents. Since `ShardRouting` is a core routing data structure, this could lead to subtle bugs. Consider returning a defensive copy or using an unmodifiable list instead. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [329-331] public ShardRouting[] getRecoveringChildShards() { - return recoveringChildShards; + return recoveringChildShards == null ? null : recoveringChildShards.clone(); } Suggestion importance[1-10]: 4 __ Why: Returning the raw `recoveringChildShards` array exposes internal state to mutation. However, the comment in the code indicates these are transient fields populated by `RoutingNodes` constructor, so the risk may be limited in practice.	Low

Previous suggestions

Suggestions up to commit 8450d14

Category	Suggestion	Impact
Possible issue	Verify enum ordinal and switch case alignment The `readFrom` method uses `Type.values()[in.readByte()]` to deserialize the recovery type, which relies on enum ordinal order. Adding `IN_PLACE_SHARD_SPLIT` at the end is correct for new nodes, but `REMOTE_STORE` was previously the last entry (ordinal 6). If any serialized data exists with `REMOTE_STORE` at ordinal 6, this is fine, but the `readFrom` switch uses the enum ordinal directly — verify that `IN_PLACE_SHARD_SPLIT` (ordinal 6) does not collide with `REMOTE_STORE` (also ordinal 6 before this change). The enum now has `REMOTE_STORE` at ordinal 6 and `IN_PLACE_SHARD_SPLIT` at ordinal 7, but the switch in `readFrom` handles `IN_PLACE_SHARD_SPLIT` before `REMOTE_STORE`, which could cause a deserialization mismatch for older serialized `REMOTE_STORE` bytes. server/src/main/java/org/opensearch/cluster/routing/RecoverySource.java [122-130] public enum Type { EMPTY_STORE, EXISTING_STORE, PEER, SNAPSHOT, LOCAL_SHARDS, REMOTE_STORE, IN_PLACE_SHARD_SPLIT } +// Ensure readFrom switch case order matches enum ordinals: +// case 5: LOCAL_SHARDS, case 6: REMOTE_STORE, case 7: IN_PLACE_SHARD_SPLIT Suggestion importance[1-10]: 7 __ Why: The `readFrom` method uses `Type.values()[in.readByte()]` for deserialization, so ordinal order matters critically. The PR adds `IN_PLACE_SHARD_SPLIT` after `REMOTE_STORE`, and the switch in `readFrom` handles `IN_PLACE_SHARD_SPLIT` before `REMOTE_STORE` — this could cause a real deserialization mismatch. However, the `improved_code` only adds a comment rather than fixing the switch case order, limiting the score.	Medium
General	Add shard ID bounds validation There is no validation that `shardId` is within the valid range `[0, numberOfShards)`. A request with an out-of-range shard ID will propagate into `SplitShardsMetadata` operations and may cause unexpected behavior or an obscure error. Add an explicit bounds check before proceeding. server/src/main/java/org/opensearch/cluster/metadata/MetadataInPlaceShardSplitService.java [100-108] if (curIndexMetadata.getNumberOfVirtualShards() != -1) { throw new IllegalArgumentException( "In-place shard split is not supported on index [" + request.getIndex() + "] with virtual shards enabled" ); } int shardId = request.getShardId(); +if (shardId < 0 \|\| shardId >= curIndexMetadata.getNumberOfShards()) { + throw new IllegalArgumentException( + "Shard [" + shardId + "] is out of range for index [" + request.getIndex() + "] with [" + + curIndexMetadata.getNumberOfShards() + "] shards" + ); +} SplitShardsMetadata splitShardsMetadata = curIndexMetadata.getSplitShardsMetadata(); Suggestion importance[1-10]: 6 __ Why: There is no bounds check on `shardId` before it's used in `SplitShardsMetadata` operations, which could lead to obscure errors. The test `testApplyShardSplitRequestThrowsForInvalidShardId` expects an `IllegalArgumentException` for shard ID 99, but without this check the exception may not be thrown or may come from a different place with a less clear message.	Low
	Ensure deserialized list is immutable The `splitChildAllocationIds` read from the stream is a mutable list, but the constructor wraps it in `Collections.unmodifiableList`. However, when reading from the stream directly in this constructor (bypassing the private constructor), the list is assigned directly without making it unmodifiable. This inconsistency could allow mutation of the internal list. The deserialization path should also wrap the list. server/src/main/java/org/opensearch/cluster/routing/AllocationId.java [119-129] AllocationId(StreamInput in) throws IOException { this.id = in.readString(); this.relocationId = in.readOptionalString(); if (in.getVersion().onOrAfter(Version.V_3_6_0)) { - splitChildAllocationIds = in.readOptionalStringList(); + List<String> childIds = in.readOptionalStringList(); + splitChildAllocationIds = childIds == null ? null : Collections.unmodifiableList(childIds); parentAllocationId = in.readOptionalString(); } else { splitChildAllocationIds = null; parentAllocationId = null; } } Suggestion importance[1-10]: 6 __ Why: The stream deserialization constructor assigns `splitChildAllocationIds` directly without wrapping it in `Collections.unmodifiableList`, while the private constructor does wrap it. This inconsistency could allow mutation of the internal list when deserialized, breaking the immutability contract of the class.	Low
	Clarify assertion for split recovery source The assertion allows `InPlaceShardSplitRecoverySource` for replica shards, but the recovery source is only set on `UNASSIGNED` or `INITIALIZING` shards. Child shards created by an in-place split should be primary shards (or at least the assertion message should be updated to reflect the new valid case). Verify that child shards from a split are always primary; if not, the assertion logic may incorrectly permit non-primary, non-searchOnly replicas to use this recovery source in other contexts. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [155-159] assert recoverySource == null \|\| recoverySource == PeerRecoverySource.INSTANCE \|\| primary \|\| searchOnly - \|\| recoverySource == RecoverySource.InPlaceShardSplitRecoverySource.INSTANCE : "replica shards always recover from primary"; + \|\| recoverySource == RecoverySource.InPlaceShardSplitRecoverySource.INSTANCE : "replica shards always recover from primary (except in-place split child shards)"; Suggestion importance[1-10]: 2 __ Why: The suggestion only changes the assertion message string, not the logic. While the clarification is marginally useful, it's a trivial documentation-only change with minimal impact on correctness or functionality.	Low

Suggestions up to commit 8600fa4

Category	Suggestion	Impact
Possible issue	Fix incorrect logical operator in assertion The assertion logic is incorrect. Using `\|\|` between `state != X` conditions means the assertion is always true (a state cannot simultaneously be INITIALIZING, RELOCATING, and SPLITTING). The original code used `&&` for the negated conditions, which is equivalent to asserting the state is not any of those values when `expectedShardSize` `< 0`. This should use `&&` to properly enforce the constraint. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [149-152] assert expectedShardSize >= 0 \|\| state != ShardRoutingState.INITIALIZING - \|\| state != ShardRoutingState.RELOCATING - \|\| state != ShardRoutingState.SPLITTING : expectedShardSize + " state: " + state; + && state != ShardRoutingState.RELOCATING + && state != ShardRoutingState.SPLITTING : expectedShardSize + " state: " + state; Suggestion importance[1-10]: 8 __ Why: The assertion uses `\|\|` between `state != X` conditions, making it always true (tautology). The correct logic requires `&&` to properly enforce that `expectedShardSize >= 0` when the state is INITIALIZING, RELOCATING, or SPLITTING. This is a real logical bug that would make the assertion meaningless.	Medium
General	Add shard ID bounds validation before split There is no validation that `shardId` is within the valid range `[0, numberOfShards)`. A request with an out-of-range shard ID (e.g., 99 for a 3-shard index) would silently proceed and corrupt the split metadata. Add a bounds check before proceeding with the split logic. server/src/main/java/org/opensearch/cluster/metadata/MetadataInPlaceShardSplitService.java [100-111] if (curIndexMetadata.getNumberOfVirtualShards() != -1) { throw new IllegalArgumentException( "In-place shard split is not supported on index [" + request.getIndex() + "] with virtual shards enabled" ); } int shardId = request.getShardId(); +if (shardId < 0 \|\| shardId >= curIndexMetadata.getNumberOfShards()) { + throw new IllegalArgumentException( + "Shard [" + shardId + "] is out of range for index [" + request.getIndex() + "] with [" + + curIndexMetadata.getNumberOfShards() + "] shards" + ); +} + SplitShardsMetadata splitShardsMetadata = curIndexMetadata.getSplitShardsMetadata(); if (splitShardsMetadata.getInProgressSplitShardIds().contains(shardId)) { throw new IllegalArgumentException("Splitting of shard [" + shardId + "] is already in progress"); } Suggestion importance[1-10]: 6 __ Why: Missing bounds validation for `shardId` is a real gap — an out-of-range shard ID could silently corrupt split metadata. The test `testApplyShardSplitRequestThrowsForInvalidShardId` expects an `IllegalArgumentException` for shard ID 99, but without this check the behavior is undefined. Adding this validation improves correctness and matches test expectations.	Low
	Validate positive child shard count in split factory There is no validation that `numberOfChildShards` is greater than zero. Calling `newSplit` with zero or a negative value would create an `AllocationId` with an empty or meaningless child list, which could cause silent failures downstream. Add a precondition check. server/src/main/java/org/opensearch/cluster/routing/AllocationId.java [200-207] public static AllocationId newSplit(AllocationId allocationId, int numberOfChildShards) { assert allocationId.getSplitChildAllocationIds() == null && allocationId.getParentAllocationId() == null; + if (numberOfChildShards <= 0) { + throw new IllegalArgumentException("numberOfChildShards must be > 0, got: " + numberOfChildShards); + } List<String> childIds = new ArrayList<>(); for (int c = 0; c < numberOfChildShards; c++) { childIds.add(UUIDs.randomBase64UUID()); } return new AllocationId(allocationId.getId(), null, childIds, null); } Suggestion importance[1-10]: 5 __ Why: Calling `newSplit` with zero or negative `numberOfChildShards` would silently create an invalid `AllocationId` with an empty child list. Adding an explicit precondition check prevents silent failures and improves robustness of the split operation.	Low
	Document or handle transient split fields in deserialization When deserializing `ShardRouting` from a `StreamInput`, `recoveringChildShards` and `parentShardId` are always set to `null`, ignoring any serialized values. If these fields need to be persisted across cluster state serialization, they should be read from the stream (with version gating similar to `AllocationId`). If they are intentionally transient, this should be documented clearly. server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java [452-453] +// These fields are intentionally transient and not persisted in the stream. recoveringChildShards = null; parentShardId = null; Suggestion importance[1-10]: 2 __ Why: The suggestion only asks to add a comment clarifying that `recoveringChildShards` and `parentShardId` are intentionally transient. The `improved_code` is essentially the same as `existing_code` with just a comment added, which is a low-impact documentation-only change.	Low

github-actions · 2026-03-24T05:54:32Z

❌ Gradle check result for 8600fa4: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-24T09:00:05Z

Persistent review updated to latest commit 8450d14

github-actions · 2026-03-24T09:45:06Z

Persistent review updated to latest commit 8dbd619

github-actions · 2026-03-24T10:15:53Z

❌ Gradle check result for 8dbd619: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-25T10:48:29Z

Failed to generate code suggestions for PR

github-actions · 2026-03-25T12:50:35Z

❌ Gradle check result for 7f2da9b: ABORTED

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

github-actions · 2026-03-27T02:43:27Z

Failed to generate code suggestions for PR

github-actions · 2026-03-27T03:17:43Z

❌ Gradle check result for d78724a: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

shwetathareja · 2026-03-27T03:21:09Z

server/src/main/java/org/opensearch/cluster/metadata/MetadataInPlaceShardSplitService.java

+     * Applies a shard split request to the given cluster state. Updates the split metadata
+     * in the index metadata and triggers a reroute so that child shards get allocated.
+     */
+    static ClusterState applyShardSplitRequest(


we should have separate check to not allow split if cluster is in mixed mode during upgrades to have a guardrail

Do you mean for the version lower than the split supported version? In other version upgrades we can support this. And for a version upgrade from a non-supported split version to a supported split version, it is implicitly handled because the REST action as well as the admin client (invoked from a plugin) won't be available on lower version cluster manager.

If you want I can still put an additional check here.

Actually I am thinking that we should prevent splitting a shard in all cases where it is now supposed to move off the node except shard rebalancing case.
And if split already started then we should cancel it and let the shard relocate in cases above.
What say?

yes makes sense to me.

I have added multiple checks now to meet shard split criteria now in this class. Cancellation of split will happen later as part of a new cluster state update so that will come in next PR. I will also need to add cancellation reason in _cat/recovery API to inform user why split was cancelled.

shwetathareja · 2026-03-27T03:27:34Z

server/src/main/java/org/opensearch/cluster/routing/AllocationId.java

+    /**
+     * Creates a new allocation id for a child shard that is the result of a split.
+     */
+    public static AllocationId newTargetSplit(AllocationId allocationId, String childAllocId) {


the above method generates a transient value and when do we call this method? Not clear on the flow here.

Ya there are actually missing pieces to avoid making this PR huge. But your concern is very valid. So linking the source code from my personal repo here. For this comment,

newSplit is called on parent shard to generate N random UUIDs and store them in splitChildAllocationIds on the parent's allocation object.

newTargetSplit is called on each child shard. Takes one of the generated UUIDs, and creates a link to its parent by storing parentAllocationId.

This is same as relocation flow where in newRelocation source shard generates random UUID for target shard where target shard's id becomes this generated UUID. And this is how both are linked together.

For split, these generations happen in createRecoveringChildShards of RoutingNodes

shwetathareja · 2026-03-27T03:47:55Z

server/src/main/java/org/opensearch/cluster/routing/ShardRouting.java

+    /**
+     * Returns the recovering child shards of this splitting shard, or null if not splitting.
+     */
+    public ShardRouting[] getRecoveringChildShards() {


getRecoveringChildShards() returns the raw ShardRouting[] array directly. Callers can mutate the array contents, breaking the immutability contract of ShardRouting. It should return a defensive copy or an unmodifiable view (e.g., Arrays.copyOf or expose as List).

can you look into this

github-actions · 2026-03-28T14:05:23Z

Failed to generate code suggestions for PR

github-actions · 2026-03-28T14:11:18Z

Failed to generate code suggestions for PR

github-actions · 2026-03-28T14:17:42Z

Failed to generate code suggestions for PR

github-actions · 2026-03-28T14:18:56Z

Failed to generate code suggestions for PR

github-actions · 2026-03-28T14:24:00Z

❌ Gradle check result for b1ffae2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: vikasvb90 <vikasvb@amazon.com>

github-actions · 2026-03-28T14:53:56Z

Failed to generate code suggestions for PR

vikasvb90 requested a review from a team as a code owner March 24, 2026 03:52

vikasvb90 self-assigned this Mar 24, 2026

vikasvb90 added the skip-changelog label Mar 24, 2026

vikasvb90 force-pushed the online_shard_split branch from 8600fa4 to 8450d14 Compare March 24, 2026 08:57

vikasvb90 force-pushed the online_shard_split branch from 8450d14 to 8dbd619 Compare March 24, 2026 09:42

opensearch-ci-bot mentioned this pull request Mar 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for IndexServiceTests #14407

Open

This was referenced Mar 25, 2026

[AUTOCUT] Gradle Check Flaky Test Report for ReindexBasicTests #15826

Open

[AUTOCUT] Gradle Check Flaky Test Report for RethrottleTests #17937

Open

[AUTOCUT] Gradle Check Flaky Test Report for SearchSlowLogTests #20653

Open

shwetathareja reviewed Mar 27, 2026

View reviewed changes

vikasvb90 force-pushed the online_shard_split branch from d78724a to 7b2c720 Compare March 28, 2026 14:04

vikasvb90 force-pushed the online_shard_split branch 2 times, most recently from 7b2c720 to b1ffae2 Compare March 28, 2026 14:17

Cluster state and recovery constructs for in-place shard split

f3dfee0

Signed-off-by: vikasvb90 <vikasvb@amazon.com>

vikasvb90 force-pushed the online_shard_split branch from b1ffae2 to f3dfee0 Compare March 28, 2026 14:52

Conversation

vikasvb90 commented Mar 24, 2026

Description

Related Issues

Check List

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 8dbd619)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 24, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

github-actions bot commented Mar 27, 2026

Uh oh!

shwetathareja Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

vikasvb90 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

vikasvb90 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

shwetathareja Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

vikasvb90 Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

shwetathareja Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

vikasvb90 Mar 27, 2026 • edited by shwetathareja Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shwetathareja Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

github-actions bot commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Mar 24, 2026 •

edited

Loading

(Review updated until commit `8dbd619`)

github-actions bot commented Mar 24, 2026 •

edited

Loading

vikasvb90 Mar 27, 2026 •

edited by shwetathareja

Loading