-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add new recovery source for reshard split target shards #129159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
078e6c6
2c520c8
b928faf
27f905c
42dd895
7e18c51
58112aa
de01dfd
691ec62
4af45d9
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -30,6 +30,7 @@ | |
| import org.elasticsearch.cluster.node.DiscoveryNode; | ||
| import org.elasticsearch.cluster.node.DiscoveryNodes; | ||
| import org.elasticsearch.cluster.routing.IndexShardRoutingTable; | ||
| import org.elasticsearch.cluster.routing.RecoverySource; | ||
| import org.elasticsearch.cluster.routing.RecoverySource.Type; | ||
| import org.elasticsearch.cluster.routing.RoutingNode; | ||
| import org.elasticsearch.cluster.routing.RoutingTable; | ||
|
|
@@ -701,6 +702,13 @@ private void createShard(ShardRouting shardRouting, ClusterState state) { | |
| logger.trace("ignoring initializing shard {} - no source node can be found.", shardId); | ||
| return; | ||
| } | ||
| } else if (shardRouting.recoverySource() instanceof RecoverySource.ReshardSplitRecoverySource reshardSplitRecoverySource) { | ||
| ShardId sourceShardId = reshardSplitRecoverySource.getSourceShardId(); | ||
| sourceNode = findSourceNodeForReshardSplitRecovery(state.routingTable(project.id()), state.nodes(), sourceShardId); | ||
| if (sourceNode == null) { | ||
| logger.trace("ignoring initializing reshard target shard {} - no source node can be found.", shardId); | ||
| return; | ||
| } | ||
| } else { | ||
| sourceNode = null; | ||
| } | ||
|
|
@@ -988,6 +996,31 @@ private static DiscoveryNode findSourceNodeForPeerRecovery(RoutingTable routingT | |
| return sourceNode; | ||
| } | ||
|
|
||
| private static DiscoveryNode findSourceNodeForReshardSplitRecovery( | ||
| RoutingTable routingTable, | ||
| DiscoveryNodes nodes, | ||
| ShardId sourceShardId | ||
| ) { | ||
| ShardRouting sourceShardRouting = routingTable.shardRoutingTable(sourceShardId).primaryShard(); | ||
|
|
||
| if (sourceShardRouting.active() == false) { | ||
| assert false : sourceShardRouting; | ||
| logger.trace("can't find reshard split source node because source shard {} is not active.", sourceShardRouting); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should assert false here and below as well.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If it fails we can leave it out for now, but then we should add a jira to ensure we have these properties in the routing table. |
||
| return null; | ||
| } | ||
|
|
||
| DiscoveryNode sourceNode = nodes.get(sourceShardRouting.currentNodeId()); | ||
| if (sourceNode == null) { | ||
| assert false : "Source node for reshard does not exist: " + sourceShardRouting.currentNodeId(); | ||
| logger.trace( | ||
| "can't find reshard split source node because source shard {} is assigned to an unknown node.", | ||
| sourceShardRouting | ||
| ); | ||
| return null; | ||
| } | ||
| return sourceNode; | ||
| } | ||
|
|
||
| private record PendingShardCreation(String clusterStateUUID, long startTimeMillis) {} | ||
|
|
||
| private class RecoveryListener implements PeerRecoveryTargetService.RecoveryListener { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert false here? I think this situation should be invalid, the
shardRoutingis taken fromstatefurther out - and we should assume that. I think we could add the same above, but would not want to complicate the work here with figuring that out.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to make sure i understand - are you saying that if routing was updated for the target shard then the same cluster state should contain routing for the source shard as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added asserts inside
findSourceNodeForReshardSplitRecovery.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mostly that having a target shard recovery without an active source shard is not sound. If we fail the source shard we should probably also fail the target shard. At least that is how peer recovery works, if the primary fails, we fail the replicas too in the same cluster state update.
I am also not fond of the way we just ignore recovery here. If we accept this case we need a test case to verify that we resume it later.
Hence the asserts to clarify the situation. We may refine later.