Skip to content

Conversation

@ghostdogpr
Copy link
Collaborator

When persisting to Redis (or other storage) fails, the Shard Manager still accepts new pods and keeps rebalancing. The problem is that if we restart the Shard Manager at that point, it will read an old state from the database and consider old assignments as current, which might end up in the situation where an entity is alive in 2 different places.

This PR makes the persistence synchronous (removed forkDaemon) so that register, rebalance and start operations will wait until persistence is over with retries and potentially die if we keep failing.

@ghostdogpr ghostdogpr merged commit de6eb79 into series/2.x Sep 19, 2025
5 checks passed
@ghostdogpr ghostdogpr deleted the persist branch September 19, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants