-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add new recovery source for reshard split target shards #129159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| } | ||
|
|
||
| public static RecoverySource readFrom(StreamInput in) throws IOException { | ||
| // TODO is transport version check needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK master nodes are upgraded last meaning that data nodes will already be able to read the new value. Let me know if i am wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how this would work in general. Couldn't you have a cluster where all the indexing nodes are also master-eligible?
I do think we intend to fail resharding actions on a cluster until all nodes support it though. I thought we had a ticket for that but I don't see it now. I've stubbed in ES-12048 for this.
At any rate I don't think we have a problem on read, since by definition we know about the new source, and the other node either also does or simply won't send this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, i agree. If this logic existed it should have been on the writer but i think that's addressed by the check when enabling resharding that you mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not rely on the master being upgraded last, but we can rely on the feature not being available. I suppose we have no rolling upgrade tests yet with the feature enabled, hence not having the version check should work just fine. So no change needed, just wanted to ensure we would not rely on the master being upgraded last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the consensus (from my perspective) is that there will be a feature flag that enables resharding as a whole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a feature flag, but a guard on all nodes in the cluster having a transport version that speaks resharding. This covers a case where a cluster is being upgraded from pre-resharding to post after a resharding feature flag has been set.
| * Recovery of a shard that is created as a result of a resharding split. | ||
| * Not to be confused with _split API. | ||
| */ | ||
| public static class ReshardSplitTargetRecoverySource extends RecoverySource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be worrying too much about naming but i don't want to call it Split because of _split API and i want Target in there to disambiguate between source and target. So i came up with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like ReshardSplit. I'm not sure what Target is disambiguating though. I think recovery is implicitly in the context of the target? We don't have two different peer recovery sources for instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking there could be some custom logic for the reshard source but indeed peer recovery does not have that.
|
Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing) |
bcully
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a naming question really.
| } | ||
|
|
||
| public static RecoverySource readFrom(StreamInput in) throws IOException { | ||
| // TODO is transport version check needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand how this would work in general. Couldn't you have a cluster where all the indexing nodes are also master-eligible?
I do think we intend to fail resharding actions on a cluster until all nodes support it though. I thought we had a ticket for that but I don't see it now. I've stubbed in ES-12048 for this.
At any rate I don't think we have a problem on read, since by definition we know about the new source, and the other node either also does or simply won't send this case.
| * Recovery of a shard that is created as a result of a resharding split. | ||
| * Not to be confused with _split API. | ||
| */ | ||
| public static class ReshardSplitTargetRecoverySource extends RecoverySource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like ReshardSplit. I'm not sure what Target is disambiguating though. I think recovery is implicitly in the context of the target? We don't have two different peer recovery sources for instance.
| logger.trace("ignoring initializing shard {} - no source node can be found.", shardId); | ||
| return; | ||
| } | ||
| } else if (shardRouting.recoverySource().getType() == Type.RESHARD_SPLIT_TARGET) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not a request for change, this isn't your API, but it would be nice if the ShardRouting API didn't make us get type and then cast in two steps. I could imagine something like a shardRouting.recoverySource().asReshardSplit() maybe that did the check and returned the cast object if it passed or null.
I'm just griping though, it's probably not worth doing now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could instanceof the shardRouting.recoverySource() instead. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that seems nicer
henningandersen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
| ShardId sourceShardId = reshardSplitRecoverySource.getSourceShardId(); | ||
| sourceNode = findSourceNodeForReshardSplitRecovery(state.routingTable(project.id()), state.nodes(), sourceShardId); | ||
| if (sourceNode == null) { | ||
| logger.trace("ignoring initializing reshard target shard {} - no source node can be found.", shardId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assert false here? I think this situation should be invalid, the shardRouting is taken from state further out - and we should assume that. I think we could add the same above, but would not want to complicate the work here with figuring that out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to make sure i understand - are you saying that if routing was updated for the target shard then the same cluster state should contain routing for the source shard as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added asserts inside findSourceNodeForReshardSplitRecovery.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is mostly that having a target shard recovery without an active source shard is not sound. If we fail the source shard we should probably also fail the target shard. At least that is how peer recovery works, if the primary fails, we fail the replicas too in the same cluster state update.
I am also not fond of the way we just ignore recovery here. If we accept this case we need a test case to verify that we resume it later.
Hence the asserts to clarify the situation. We may refine later.
| ShardRouting sourceShardRouting = routingTable.shardRoutingTable(sourceShardId).primaryShard(); | ||
|
|
||
| if (sourceShardRouting.active() == false) { | ||
| logger.trace("can't find reshard split source node because source shard {} is not active.", sourceShardRouting); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should assert false here and below as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it fails we can leave it out for now, but then we should add a jira to ensure we have these properties in the routing table.
| } | ||
|
|
||
| public static RecoverySource readFrom(StreamInput in) throws IOException { | ||
| // TODO is transport version check needed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not rely on the master being upgraded last, but we can rely on the feature not being available. I suppose we have no rolling upgrade tests yet with the feature enabled, hence not having the version check should work just fine. So no change needed, just wanted to ensure we would not rely on the master being upgraded last.
This PR introduces new recovery source for shards that are targets of a reshard split. This recovery source contains metadata necessary for the recovery process of split target shards to work. It also serves as an abstraction layer and helps recovery code path to avoid reasoning about data that is very specific to resharding like
IndexReshardingMetadata.These changes are tested in scope of resharding tests in the linked PR.