Commit 8b2e727
committed
fix: resolve deadlock when maxSurge>0 rolling update on single-replica LWS
When a LeaderWorkerSet has replicas=1 and maxSurge=1, triggering a
rolling update caused the controller to immediately emit a "deleting
surge replica" event and return replicas=1 (no surge ever created),
leaving the update permanently stuck with the StatefulSet at
partition=1, replicas=1.
Root cause: in Case 2 of rollingUpdateParameters (a new rolling update
is detected) the code called wantReplicas(lwsReplicas). With replicas=1
and maxSurge=1 the condition inside wantReplicas was:
unreadyReplicas(1) <= maxSurge(1) → true
which jumped straight into the "release surge" branch and returned
replicas=1. No surge replica was ever created, so the StatefulSet
partition could never advance.
Fix: Case 2 now returns burstReplicas directly instead of going through
wantReplicas. At the moment a new update is detected all existing
replicas are still running the old template (none are unready due to
the update yet), so the correct action is to expand to
lwsReplicas+maxSurge first. wantReplicas is only meaningful once
stsReplicas==burstReplicas and the surge pods are being replaced.
A new integration test "rolling update with maxSurge=1 and single
replica creates surge before rolling" directly exercises the regression:
it verifies that the leader StatefulSet expands to replicas=2
immediately after the update is triggered, then converges back to
replicas=1 once all groups are ready.
Fixes: #688
Signed-off-by: veast <veast@users.noreply.github.com>1 parent a3dc446 commit 8b2e727
File tree
2 files changed
+91
-6
lines changed- pkg/controllers
- test/integration/controllers
2 files changed
+91
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
288 | 288 | | |
289 | 289 | | |
290 | 290 | | |
291 | | - | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
292 | 295 | | |
293 | 296 | | |
294 | | - | |
295 | | - | |
296 | | - | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
297 | 300 | | |
298 | 301 | | |
299 | 302 | | |
| |||
302 | 305 | | |
303 | 306 | | |
304 | 307 | | |
305 | | - | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
306 | 315 | | |
307 | 316 | | |
308 | | - | |
| 317 | + | |
309 | 318 | | |
310 | 319 | | |
311 | 320 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2297 | 2297 | | |
2298 | 2298 | | |
2299 | 2299 | | |
| 2300 | + | |
| 2301 | + | |
| 2302 | + | |
| 2303 | + | |
| 2304 | + | |
| 2305 | + | |
| 2306 | + | |
| 2307 | + | |
| 2308 | + | |
| 2309 | + | |
| 2310 | + | |
| 2311 | + | |
| 2312 | + | |
| 2313 | + | |
| 2314 | + | |
| 2315 | + | |
| 2316 | + | |
| 2317 | + | |
| 2318 | + | |
| 2319 | + | |
| 2320 | + | |
| 2321 | + | |
| 2322 | + | |
| 2323 | + | |
| 2324 | + | |
| 2325 | + | |
| 2326 | + | |
| 2327 | + | |
| 2328 | + | |
| 2329 | + | |
| 2330 | + | |
| 2331 | + | |
| 2332 | + | |
| 2333 | + | |
| 2334 | + | |
| 2335 | + | |
| 2336 | + | |
| 2337 | + | |
| 2338 | + | |
| 2339 | + | |
| 2340 | + | |
| 2341 | + | |
| 2342 | + | |
| 2343 | + | |
| 2344 | + | |
| 2345 | + | |
| 2346 | + | |
| 2347 | + | |
| 2348 | + | |
| 2349 | + | |
| 2350 | + | |
| 2351 | + | |
| 2352 | + | |
| 2353 | + | |
| 2354 | + | |
| 2355 | + | |
| 2356 | + | |
| 2357 | + | |
| 2358 | + | |
| 2359 | + | |
| 2360 | + | |
| 2361 | + | |
| 2362 | + | |
| 2363 | + | |
| 2364 | + | |
| 2365 | + | |
| 2366 | + | |
| 2367 | + | |
| 2368 | + | |
| 2369 | + | |
| 2370 | + | |
| 2371 | + | |
| 2372 | + | |
| 2373 | + | |
| 2374 | + | |
| 2375 | + | |
2300 | 2376 | | |
2301 | 2377 | | |
2302 | 2378 | | |
| |||
0 commit comments