Add new recovery source for reshard split target shards #129159

lkts · 2025-06-09T18:18:40Z

This PR introduces new recovery source for shards that are targets of a reshard split. This recovery source contains metadata necessary for the recovery process of split target shards to work. It also serves as an abstraction layer and helps recovery code path to avoid reasoning about data that is very specific to resharding like IndexReshardingMetadata.

These changes are tested in scope of resharding tests in the linked PR.

lkts · 2025-06-09T20:18:06Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

    }

    public static RecoverySource readFrom(StreamInput in) throws IOException {
+        // TODO is transport version check needed?


AFAIK master nodes are upgraded last meaning that data nodes will already be able to read the new value. Let me know if i am wrong.

I don't really understand how this would work in general. Couldn't you have a cluster where all the indexing nodes are also master-eligible?

I do think we intend to fail resharding actions on a cluster until all nodes support it though. I thought we had a ticket for that but I don't see it now. I've stubbed in ES-12048 for this.

At any rate I don't think we have a problem on read, since by definition we know about the new source, and the other node either also does or simply won't send this case.

Thank you, i agree. If this logic existed it should have been on the writer but i think that's addressed by the check when enabling resharding that you mentioned.

We should not rely on the master being upgraded last, but we can rely on the feature not being available. I suppose we have no rolling upgrade tests yet with the feature enabled, hence not having the version check should work just fine. So no change needed, just wanted to ensure we would not rely on the master being upgraded last.

Yes, the consensus (from my perspective) is that there will be a feature flag that enables resharding as a whole.

not a feature flag, but a guard on all nodes in the cluster having a transport version that speaks resharding. This covers a case where a cluster is being upgraded from pre-resharding to post after a resharding feature flag has been set.

lkts · 2025-06-09T21:22:51Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

+     * Recovery of a shard that is created as a result of a resharding split.
+     * Not to be confused with _split API.
+     */
+    public static class ReshardSplitTargetRecoverySource extends RecoverySource {


I may be worrying too much about naming but i don't want to call it Split because of _split API and i want Target in there to disambiguate between source and target. So i came up with this.

I like ReshardSplit. I'm not sure what Target is disambiguating though. I think recovery is implicitly in the context of the target? We don't have two different peer recovery sources for instance.

I was thinking there could be some custom logic for the reshard source but indeed peer recovery does not have that.

elasticsearchmachine · 2025-06-09T21:26:09Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

bcully

LGTM, just a naming question really.

bcully · 2025-06-10T15:43:36Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

    }

    public static RecoverySource readFrom(StreamInput in) throws IOException {
+        // TODO is transport version check needed?


I don't really understand how this would work in general. Couldn't you have a cluster where all the indexing nodes are also master-eligible?

I do think we intend to fail resharding actions on a cluster until all nodes support it though. I thought we had a ticket for that but I don't see it now. I've stubbed in ES-12048 for this.

At any rate I don't think we have a problem on read, since by definition we know about the new source, and the other node either also does or simply won't send this case.

bcully · 2025-06-10T15:59:55Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

+     * Recovery of a shard that is created as a result of a resharding split.
+     * Not to be confused with _split API.
+     */
+    public static class ReshardSplitTargetRecoverySource extends RecoverySource {


I like ReshardSplit. I'm not sure what Target is disambiguating though. I think recovery is implicitly in the context of the target? We don't have two different peer recovery sources for instance.

bcully · 2025-06-10T21:43:08Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

                    logger.trace("ignoring initializing shard {} - no source node can be found.", shardId);
                    return;
                }
+            } else if (shardRouting.recoverySource().getType() == Type.RESHARD_SPLIT_TARGET) {


not a request for change, this isn't your API, but it would be nice if the ShardRouting API didn't make us get type and then cast in two steps. I could imagine something like a shardRouting.recoverySource().asReshardSplit() maybe that did the check and returned the cast object if it passed or null.

I'm just griping though, it's probably not worth doing now.

I could instanceof the shardRouting.recoverySource() instead. What do you think?

yeah, that seems nicer

henningandersen

LGTM.

henningandersen · 2025-06-12T19:58:57Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+                ShardId sourceShardId = reshardSplitRecoverySource.getSourceShardId();
+                sourceNode = findSourceNodeForReshardSplitRecovery(state.routingTable(project.id()), state.nodes(), sourceShardId);
+                if (sourceNode == null) {
+                    logger.trace("ignoring initializing reshard target shard {} - no source node can be found.", shardId);


Can we assert false here? I think this situation should be invalid, the shardRouting is taken from state further out - and we should assume that. I think we could add the same above, but would not want to complicate the work here with figuring that out.

Want to make sure i understand - are you saying that if routing was updated for the target shard then the same cluster state should contain routing for the source shard as well?

I've added asserts inside findSourceNodeForReshardSplitRecovery.

It is mostly that having a target shard recovery without an active source shard is not sound. If we fail the source shard we should probably also fail the target shard. At least that is how peer recovery works, if the primary fails, we fail the replicas too in the same cluster state update.

I am also not fond of the way we just ignore recovery here. If we accept this case we need a test case to verify that we resume it later.

Hence the asserts to clarify the situation. We may refine later.

henningandersen · 2025-06-12T20:00:34Z

server/src/main/java/org/elasticsearch/indices/cluster/IndicesClusterStateService.java

+        ShardRouting sourceShardRouting = routingTable.shardRoutingTable(sourceShardId).primaryShard();
+
+        if (sourceShardRouting.active() == false) {
+            logger.trace("can't find reshard split source node because source shard {} is not active.", sourceShardRouting);


We should assert false here and below as well.

If it fails we can leave it out for now, but then we should add a jira to ensure we have these properties in the routing table.

henningandersen · 2025-06-12T20:12:30Z

server/src/main/java/org/elasticsearch/cluster/routing/RecoverySource.java

    }

    public static RecoverySource readFrom(StreamInput in) throws IOException {
+        // TODO is transport version check needed?


We should not rely on the master being upgraded last, but we can rely on the feature not being available. I suppose we have no rolling upgrade tests yet with the feature enabled, hence not having the version check should work just fine. So no change needed, just wanted to ensure we would not rely on the master being upgraded last.

Add new recovery source for reshard split target shards

078e6c6

elasticsearchmachine added the v9.1.0 label Jun 9, 2025

lkts added 2 commits June 9, 2025 11:22

Merge branch 'main' into split_recovery_source

2c520c8

iter

b928faf

lkts commented Jun 9, 2025

View reviewed changes

elasticsearchmachine added the serverless-linked Added by automation, don't add manually label Jun 9, 2025

lkts requested review from bcully and henningandersen June 9, 2025 21:19

lkts marked this pull request as ready for review June 9, 2025 21:19

elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 9, 2025

naming

27f905c

lkts commented Jun 9, 2025

View reviewed changes

lkts added >non-issue :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. labels Jun 9, 2025

elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed needs:triage Requires assignment of a team area label labels Jun 9, 2025

[CI] Auto commit changes from spotless

42dd895

bcully approved these changes Jun 10, 2025

View reviewed changes

lkts added 2 commits June 11, 2025 10:21

Address feedback

7e18c51

Merge branch 'main' into split_recovery_source

58112aa

henningandersen approved these changes Jun 12, 2025

View reviewed changes

lkts added 3 commits June 12, 2025 13:42

Add asserts

de01dfd

Merge branch 'main' into split_recovery_source

691ec62

Merge branch 'main' into split_recovery_source

4af45d9

lkts merged commit b24bb35 into elastic:main Jun 13, 2025
18 checks passed

lkts deleted the split_recovery_source branch June 13, 2025 19:27

lkts mentioned this pull request Jun 16, 2025

Handle recovery of shards created during resharding split #129091

Closed

Add new recovery source for reshard split target shards #129159

Add new recovery source for reshard split target shards #129159

Uh oh!

Conversation

lkts commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lkts Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lkts Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jun 9, 2025

Uh oh!

bcully left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lkts Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lkts commented Jun 9, 2025 •

edited

Loading

lkts Jun 9, 2025 •

edited

Loading

lkts Jun 9, 2025 •

edited

Loading

lkts Jun 12, 2025 •

edited

Loading