-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Waiting for desired allocation before starting shard snapshot #130300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If the allocation of a shard is undesired, its snapshort now waits for the desired allocation.
|
Hi @ywangd, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this won't do what we want. The problematic situation is where a snapshot starts and then the cluster scales up, but the ongoing snapshot prevents shards from moving to the scaled-up nodes. In that case, at the time the snapshot starts all the shards are in their desired locations so they'll go straight through to INIT.
Instead we need to bound the number of shards in state INIT (on each node) so that, should a scale-up occur, most of the shards are free to move already.
Yeah it is intentional that this PR does not fix this problem. It tries to fix the other problem where continuous shard snapshots leave no quiet time for the shard to relocate. That is, it is an alternative to limiting max concurrent snapshots. It helps to delay a shard snapshot when the cluster is unbalanced and the shard is on undesired allocation. So the shard snapshot waits for relocation to happen first instead of locking it down regardless like what we have now. I think this case is kinda orthogonal because with bounded |
|
That's true, but still we have seen cases where even a single snapshot takes several hours to complete. The later snapshots aren't really the problem here, it's the first one that we need to address. Indeed, the reason why this first snapshot takes so long appears to be due to the cluster scaling down after a spike in indexing, leaving it with severely restricted snapshot capacity. We need to find a way to allow the cluster to scale back up again in that case, even if it's only one snapshot running. If we limited the number of shards in state |
My thinking is that individual shard snapshot may not take all the time to complete. As they complete, they can be free to move with this change. But today they will be locked down again by a second snapshot. That indeed means the single shard snapshot ran for more than 30min which is an issue this PR does not fix. In another word, it does not fix the initial slowness but can prevent it from deteriorating. |
If the allocation of a shard is undesired, its snapshort now waits for the desired allocation.