Commit 2823adf
authored
Make sure CHECKPOINT is execute after promote (patroni#3368)
It was possible that `Rewind._checkpoint_task` wasn't reset on demote if CHECKPOINT wasn't yet finished, what resulted in using stale `result` when the next promote is triggered.
It is not easy to reproduce, but steps are the following:
1. failover to node1.
2. while CHECKPOINT after promote is still running, switchover to node2.
3. next failover/switchover to node1 results in missing CHECKPOINT.
To solve the problem we take following measures:
1. call `Rewind.reset_state()` before promote.
2. reset `Rewind._checkpoint_task` from trigger_check_diverged_lsn().
Besides that, we didn't check that CHECKPOINT during 2 actually finished successfully. If check implemented correctly chances to hit the problem would have been much smaller. However, there was still race condition, if switchover was triggered right after CHECKPOINT task finished.
Close patroni#33671 parent 30915f3 commit 2823adf
2 files changed
+12
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1121 | 1121 | | |
1122 | 1122 | | |
1123 | 1123 | | |
| 1124 | + | |
1124 | 1125 | | |
1125 | 1126 | | |
1126 | 1127 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
76 | 78 | | |
77 | 79 | | |
78 | 80 | | |
| |||
306 | 308 | | |
307 | 309 | | |
308 | 310 | | |
| 311 | + | |
| 312 | + | |
309 | 313 | | |
310 | | - | |
311 | | - | |
312 | | - | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
313 | 321 | | |
314 | 322 | | |
315 | 323 | | |
| |||
0 commit comments