You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/4603-tune-crashloopbackoff/README.md
+69-3Lines changed: 69 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -218,7 +218,7 @@ approach to revisiting the CrashLoopBackoff behaviors for common use cases:
218
218
2. allowing Pods to opt-in to an even faster backoff curve
219
219
220
220
For each of these changes, the exact values are subject to modification in the
221
-
alpha period in order to empirically derive derived defaults intended to
221
+
alpha period in order to empirically derive defaults intended to
222
222
maintain node stability.
223
223
224
224
## Motivation
@@ -436,9 +436,75 @@ This might be a good place to talk about core concepts and how they relate.
436
436
-->
437
437
#### On Success
438
438
439
-
The original version of this proposal included a change specific to Pods transitioning through the "Succeeded" phase. On further discussion, this was determined to be both too risky and a non-goal for Kubernetes architecturally, and moved into the Alternatives section. The risk for bad actors is described in the Alternatives section and is somewhat obvious. The larger point of it being a non-goal within the design framework of Kubernetes as a whole is less transparent and discussed here.
439
+
The original version of this proposal included a change specific to Pods
440
+
transitioning through the "Succeeded" phase to have flat rate restarts. On
441
+
further discussion, this was determined to be both too risky and a non-goal for
442
+
Kubernetes architecturally, and moved into the Alternatives section. The risk
443
+
for bad actors overloading the kubelet is described in the Alternatives section
444
+
and is somewhat obvious. The larger point of it being a non-goal within the
445
+
design framework of Kubernetes as a whole is less transparent and discussed
446
+
here.
447
+
448
+
After discussion with early Kubernetes contributors and members of SIG-Node,
449
+
it's become more clear to the author that the prevailing Kubernetes assumption
450
+
is that that on its own, the Pod API best models long-running containers that
451
+
rarely or never exit themselves with "Success"; features like autoscaling,
452
+
rolling updates, and enhanced workload types like StatefulSets assume this,
453
+
while other workload types like those implemented with the Job and CronJob API
454
+
better model workloads that do exit themselves, running until Success or at
455
+
predictable intervals. In line with this assumption, Pods that run "for a while"
456
+
(longer than 10 minutes) are the ones that are "rewarded" with a reset backoff
457
+
counter -- not Pods that exit with Success. Ultimately, non-Job Pods are not
458
+
intended to exit Successfully in any meaningful way to the infrastructure, and
459
+
quick rerun behavior of any application code is considered to be an application
460
+
level concern instead.
461
+
462
+
Therefore, even though it is widely desired by commenters on
463
+
[Kubernetes#57291](https://github.com/kubernetes/kubernetes/issues/57291), this
464
+
KEP is not pursuing a different backoff curve for Pods exiting with Success any
465
+
longer.
466
+
467
+
For Pods that are today intended to rerun after Success, it is instead suggested
468
+
to
469
+
470
+
1. exec the application logic with an init script or shell that reruns it
0 commit comments