You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -363,11 +363,10 @@ often or more intelligently to reduce the amount of time a workload has to wait
363
363
to try again after a failure. In the extreme cases, users want to be able to
364
364
configure (by container, node, or exit code) the backoff to close to 0 seconds.
365
365
This KEP considers it out of scope to implement fully user-customizable
366
-
behavior, and too risky to node stability to allow legitimately crashing
367
-
workloads to have a backoff of 0, but it is in scope to provide users a way to
368
-
opt workloads in to a faster restart curve that is not as drastic as what is
369
-
intended for `Success` states, nor as beholden to the status quo as the new
370
-
default front loaded decay with interval modification.
366
+
behavior, and too risky without full and complete benchmarking to node stability
367
+
to allow legitimately crashing workloads to have a backoff of 0, but it is in
368
+
scope for the first alpha to provide users a way to opt workloads in to a even
369
+
faster restart behavior.
371
370
372
371
Pods and restartable init (aka sidecar) containers will be able to set a new
373
372
OneOf value, `restartPolicy: Rapid`, to opt in to an exponential backoff decay
@@ -465,7 +464,7 @@ How will UX be reviewed, and by whom?
465
464
Consider including folks who also work outside the SIG or subproject.
466
465
-->
467
466
468
-
## Design Details
467
+
## Design Details
469
468
470
469
<!--
471
470
This section should contain enough information that the specifics of your
@@ -475,15 +474,38 @@ proposal will be implemented, this is the place to discuss them.
475
474
-->
476
475
477
476
### Front loaded decay curve methodology
478
-
Why change the initial value of the backoff curve instead of its rate, or why not change the decay function entirely to other well known equations (like functions resulting in curves that are lienar, parabolic, sinusoidal, etc)?
479
-
480
-
Exponential decay, particularly at a rate of 2x, is commonly used for software retry backoff as it has the nice properties of starting restarts at a low value, but penalizing repeated crashes harshly, to protect primarily against unrecoverable failures. In contrast, we can interpret linear curves as penalizing every failure the same, or parabolic and sinusoidal curves as giving our software a "second chance" and forgiving later failures more. For a default restart decay curve, where the cause of the restart cannot be known, 2x exponential decay still models the desired properties more, as the biggest risk is unrecoverable failures causing "runaway" containers to overload kubelet.
481
-
482
-
To determine the effect in abstract of changing the initial value on current behavior, we modeled the change in the starting value of the decay from 10s to 1s, 250ms, or even 25ms. For today's decay rate, the first restart is within the first 10s, the second within the first 30s, the third within the first 70s. Using those same time windows to compare alternate initial values, for example changing the initial rate to 1s, we would instead have 3 restarts in the first time window, 1 restart within the time window, and two more restarts within the third time window. As seen below, this type of change gives us more restarts earlier, but even at 250ms or 25ms initial values, each approach a similar rate of restarts after the third time window.
483
-
484
-

485
-
486
-
Among these modeled initial values, we would get between 3-7 excess restarts per backoff lifetime, mostly within the first three time windows matching today's restart behavior.
477
+
Why change the initial value of the backoff curve instead of its rate, or why
478
+
not change the decay function entirely to other well known equations (like
479
+
functions resulting in curves that are lienar, parabolic, sinusoidal, etc)?
480
+
481
+
Exponential decay, particularly at a rate of 2x, is commonly used for software
482
+
retry backoff as it has the nice properties of starting restarts at a low value,
483
+
but penalizing repeated crashes harshly, to protect primarily against
484
+
unrecoverable failures. In contrast, we can interpret linear curves as
485
+
penalizing every failure the same, or parabolic and sinusoidal curves as giving
486
+
our software a "second chance" and forgiving later failures more. For a default
487
+
restart decay curve, where the cause of the restart cannot be known, 2x
488
+
exponential decay still models the desired properties more, as the biggest risk
489
+
is unrecoverable failures causing "runaway" containers to overload kubelet.
490
+
491
+
To determine the effect in abstract of changing the initial value on current
492
+
behavior, we modeled the change in the starting value of the decay from 10s to
493
+
1s, 250ms, or even 25ms. For today's decay rate, the first restart is within the
494
+
first 10s, the second within the first 30s, the third within the first 70s.
495
+
Using those same time windows to compare alternate initial values, for example
496
+
changing the initial rate to 1s, we would instead have 3 restarts in the first
497
+
time window, 1 restart within the time window, and two more restarts within the
498
+
third time window. As seen below, this type of change gives us more restarts
499
+
earlier, but even at 250ms or 25ms initial values, each approach a similar rate
500
+
of restarts after the third time window.
501
+
502
+

505
+
506
+
Among these modeled initial values, we would get between 3-7 excess restarts per
507
+
backoff lifetime, mostly within the first three time windows matching today's
508
+
restart behavior.
487
509
488
510
#### New OneOf for `restartPolicy` -- `Rapid`
489
511
`restartPolicy` is an immutable field in podSpec and containerSpec. If set in podSpec, each container in the Pod inherits the Pod's restart policy of either `Never` (default), `OnFailure`, or `Always`; for a Job, the only valid options are `Never` and `OnFailure`. In containerSpec, it is valid ONLY on init containers and ONLY as `Always`, to configure a sidecar container that runs continuously alongside the regular containers in the Pod.
@@ -823,15 +845,9 @@ well as the [existing list] of feature gates.
0 commit comments