Commit 1b6be7b
Fix OpenStackClient pod relocation during node failures
These changes ensure OpenStackClient pods are automatically rescheduled
when nodes fail, instead of requiring manual intervention to delete
stuck pods. The 120-second tolerations provide faster failover compared
to the 5min default, while the stuck pod detection handles edge cases
where normal eviction fails.
- Adds tolerations for faster pod eviction (120s vs 5min default)
* Handle node.kubernetes.io/not-ready taints
* Handle node.kubernetes.io/unreachable taints
- Force delete stuck pods with grace period 0
Note:
- going lower then 120s could be too aggressive and result in pod
eviction e.g. during a network issue, or kubelet restarts
- in a follow up same tolerations should be added to the operator
controller manager deployments, since the
openstack-operator-controller-manager is the one handling the
openstackclient pod.
Jira: OSPRH-18450
Signed-off-by: Martin Schuppert <[email protected]>1 parent ef35339 commit 1b6be7b
File tree
2 files changed
+26
-0
lines changed- controllers/client
- pkg/openstackclient
2 files changed
+26
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
378 | 378 | | |
379 | 379 | | |
380 | 380 | | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
381 | 393 | | |
382 | 394 | | |
383 | 395 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
98 | 112 | | |
99 | 113 | | |
100 | 114 | | |
| |||
0 commit comments