You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Handle scheduling failures more gracefully (#4486)
Currently if we have a scheduling failure, we continuously run
scheduling rounds until we succeed
- Often it won't succeed as the state of the system can't change much
due to not running any more reconciliations
Now we will run continue to run reconciliation rounds between scheduling
rounds, even if the scheduling round fails
- We do this by setting `previousSchedulingRoundEnd` even if the
scheduling round failed
This allows the system to continue working even if we have a bug:
- It also allows operators to cancel the jobs causing the bug / the
state to evolve so future scheduling rounds may not exhibit the bug
Signed-off-by: JamesMurkin <jamesmurkin@hotmail.com>
0 commit comments