Skip to content

Conversation

@ywangd
Copy link
Member

@ywangd ywangd commented Jun 30, 2025

If the allocation of a shard is undesired, its snapshort now waits for the desired allocation.

If the allocation of a shard is undesired, its snapshort now waits for
the desired allocation.
@ywangd ywangd added >enhancement :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Jun 30, 2025
@ywangd ywangd requested a review from DaveCTurner June 30, 2025 01:51
@elasticsearchmachine
Copy link
Collaborator

Hi @ywangd, I've created a changelog YAML for you.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this won't do what we want. The problematic situation is where a snapshot starts and then the cluster scales up, but the ongoing snapshot prevents shards from moving to the scaled-up nodes. In that case, at the time the snapshot starts all the shards are in their desired locations so they'll go straight through to INIT.

Instead we need to bound the number of shards in state INIT (on each node) so that, should a scale-up occur, most of the shards are free to move already.

@ywangd
Copy link
Member Author

ywangd commented Jun 30, 2025

The problematic situation is where a snapshot starts and then the cluster scales up, but the ongoing snapshot prevents shards from moving to the scaled-up nodes. In that case, at the time the snapshot starts all the shards are in their desired locations so they'll go straight through to INIT.

Yeah it is intentional that this PR does not fix this problem. It tries to fix the other problem where continuous shard snapshots leave no quiet time for the shard to relocate. That is, it is an alternative to limiting max concurrent snapshots. It helps to delay a shard snapshot when the cluster is unbalanced and the shard is on undesired allocation. So the shard snapshot waits for relocation to happen first instead of locking it down regardless like what we have now.

I think this case is kinda orthogonal because with bounded INIT shards, it is still theoretically possible for continuous snapshots locking those in INIT state for too long. I feel this could be improvement for the current situation though it does not fix it entirely?

@DaveCTurner
Copy link
Contributor

DaveCTurner commented Jun 30, 2025

That's true, but still we have seen cases where even a single snapshot takes several hours to complete. The later snapshots aren't really the problem here, it's the first one that we need to address. Indeed, the reason why this first snapshot takes so long appears to be due to the cluster scaling down after a spike in indexing, leaving it with severely restricted snapshot capacity. We need to find a way to allow the cluster to scale back up again in that case, even if it's only one snapshot running.

If we limited the number of shards in state INIT then we wouldn't need this extra complexity.

@ywangd
Copy link
Member Author

ywangd commented Jun 30, 2025

a single snapshot takes several hours to complete

My thinking is that individual shard snapshot may not take all the time to complete. As they complete, they can be free to move with this change. But today they will be locked down again by a second snapshot. That indeed means the single shard snapshot ran for more than 30min which is an issue this PR does not fix. In another word, it does not fix the initial slowness but can prevent it from deteriorating.

@ywangd ywangd closed this Jun 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >enhancement v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants