Skip to content

Commit fbc6896

Browse files
committed
PRR questionnaire
Signed-off-by: Laura Lorenz <[email protected]>
1 parent 7de94ab commit fbc6896

File tree

2 files changed

+107
-27
lines changed

2 files changed

+107
-27
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 100 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -363,11 +363,10 @@ often or more intelligently to reduce the amount of time a workload has to wait
363363
to try again after a failure. In the extreme cases, users want to be able to
364364
configure (by container, node, or exit code) the backoff to close to 0 seconds.
365365
This KEP considers it out of scope to implement fully user-customizable
366-
behavior, and too risky to node stability to allow legitimately crashing
367-
workloads to have a backoff of 0, but it is in scope to provide users a way to
368-
opt workloads in to a faster restart curve that is not as drastic as what is
369-
intended for `Success` states, nor as beholden to the status quo as the new
370-
default front loaded decay with interval modification.
366+
behavior, and too risky without full and complete benchmarking to node stability
367+
to allow legitimately crashing workloads to have a backoff of 0, but it is in
368+
scope for the first alpha to provide users a way to opt workloads in to a even
369+
faster restart behavior.
371370

372371
Pods and restartable init (aka sidecar) containers will be able to set a new
373372
OneOf value, `restartPolicy: Rapid`, to opt in to an exponential backoff decay
@@ -465,7 +464,7 @@ How will UX be reviewed, and by whom?
465464
Consider including folks who also work outside the SIG or subproject.
466465
-->
467466

468-
## Design Details
467+
## Design Details
469468

470469
<!--
471470
This section should contain enough information that the specifics of your
@@ -475,15 +474,38 @@ proposal will be implemented, this is the place to discuss them.
475474
-->
476475

477476
### Front loaded decay curve methodology
478-
Why change the initial value of the backoff curve instead of its rate, or why not change the decay function entirely to other well known equations (like functions resulting in curves that are lienar, parabolic, sinusoidal, etc)?
479-
480-
Exponential decay, particularly at a rate of 2x, is commonly used for software retry backoff as it has the nice properties of starting restarts at a low value, but penalizing repeated crashes harshly, to protect primarily against unrecoverable failures. In contrast, we can interpret linear curves as penalizing every failure the same, or parabolic and sinusoidal curves as giving our software a "second chance" and forgiving later failures more. For a default restart decay curve, where the cause of the restart cannot be known, 2x exponential decay still models the desired properties more, as the biggest risk is unrecoverable failures causing "runaway" containers to overload kubelet.
481-
482-
To determine the effect in abstract of changing the initial value on current behavior, we modeled the change in the starting value of the decay from 10s to 1s, 250ms, or even 25ms. For today's decay rate, the first restart is within the first 10s, the second within the first 30s, the third within the first 70s. Using those same time windows to compare alternate initial values, for example changing the initial rate to 1s, we would instead have 3 restarts in the first time window, 1 restart within the time window, and two more restarts within the third time window. As seen below, this type of change gives us more restarts earlier, but even at 250ms or 25ms initial values, each approach a similar rate of restarts after the third time window.
483-
484-
![A graph showing different exponential backoff decays for initial values of 10s, 1s, 250ms and 25ms](initialvaluesandnumberofrestarts.png' "Changes to decay with different initial values")
485-
486-
Among these modeled initial values, we would get between 3-7 excess restarts per backoff lifetime, mostly within the first three time windows matching today's restart behavior.
477+
Why change the initial value of the backoff curve instead of its rate, or why
478+
not change the decay function entirely to other well known equations (like
479+
functions resulting in curves that are lienar, parabolic, sinusoidal, etc)?
480+
481+
Exponential decay, particularly at a rate of 2x, is commonly used for software
482+
retry backoff as it has the nice properties of starting restarts at a low value,
483+
but penalizing repeated crashes harshly, to protect primarily against
484+
unrecoverable failures. In contrast, we can interpret linear curves as
485+
penalizing every failure the same, or parabolic and sinusoidal curves as giving
486+
our software a "second chance" and forgiving later failures more. For a default
487+
restart decay curve, where the cause of the restart cannot be known, 2x
488+
exponential decay still models the desired properties more, as the biggest risk
489+
is unrecoverable failures causing "runaway" containers to overload kubelet.
490+
491+
To determine the effect in abstract of changing the initial value on current
492+
behavior, we modeled the change in the starting value of the decay from 10s to
493+
1s, 250ms, or even 25ms. For today's decay rate, the first restart is within the
494+
first 10s, the second within the first 30s, the third within the first 70s.
495+
Using those same time windows to compare alternate initial values, for example
496+
changing the initial rate to 1s, we would instead have 3 restarts in the first
497+
time window, 1 restart within the time window, and two more restarts within the
498+
third time window. As seen below, this type of change gives us more restarts
499+
earlier, but even at 250ms or 25ms initial values, each approach a similar rate
500+
of restarts after the third time window.
501+
502+
![A graph showing different exponential backoff decays for initial values of
503+
10s, 1s, 250ms and 25ms](initialvaluesandnumberofrestarts.png' "Changes to decay
504+
with different initial values")
505+
506+
Among these modeled initial values, we would get between 3-7 excess restarts per
507+
backoff lifetime, mostly within the first three time windows matching today's
508+
restart behavior.
487509

488510
#### New OneOf for `restartPolicy` -- `Rapid`
489511
`restartPolicy` is an immutable field in podSpec and containerSpec. If set in podSpec, each container in the Pod inherits the Pod's restart policy of either `Never` (default), `OnFailure`, or `Always`; for a Job, the only valid options are `Never` and `OnFailure`. In containerSpec, it is valid ONLY on init containers and ONLY as `Always`, to configure a sidecar container that runs continuously alongside the regular containers in the Pod.
@@ -823,15 +845,9 @@ well as the [existing list] of feature gates.
823845
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
824846
-->
825847

826-
- [ ] Feature gate (also fill in values in `kep.yaml`)
827-
- Feature gate name:
828-
- Components depending on the feature gate:
829-
- [ ] Other
830-
- Describe the mechanism:
831-
- Will enabling / disabling the feature require downtime of the control
832-
plane?
833-
- Will enabling / disabling the feature require downtime or reprovisioning
834-
of a node?
848+
- [x] Feature gate (also fill in values in `kep.yaml`)
849+
- Feature gate name: `ReduceDefaultCrashLoopBackoffDecay` and `EnableRapidCrashLoopBackoffDecay`
850+
- Components depending on the feature gate: `kube-apiserver`, `kubelet`
835851

836852
###### Does enabling the feature change any default behavior?
837853

@@ -840,6 +856,15 @@ Any change of default behavior may be surprising to users or break existing
840856
automations, so be extremely careful here.
841857
-->
842858

859+
Yes, `ReduceDefaultCrashLoopBackoffDecay` changes the default backoff curve for
860+
exiting Pods and sidecar containers when `restartPolicy` is either `OnFailure`
861+
or `Always`.
862+
863+
Since we currently only have anecdotal benchmarking, the alpha will implement
864+
the most conservative modeled initial value, 1s, resulting in 3 excess restarts
865+
per backoff lifetime. (See [this section](#front-loaded-decay-curve-methodology)
866+
for the source.]
867+
843868
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
844869

845870
<!--
@@ -853,8 +878,28 @@ feature.
853878
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
854879
-->
855880

881+
Yes, disable is supported.
882+
883+
For `ReduceDefaultCrashLoopBackoffDecay`, if this is disabled, once kubelet is
884+
restarted it will initialize the default backoff to the prior initial value of
885+
10s, and all restart delays thereafter will be calculated against this equation.
886+
887+
For `EnableRapidCrashLoopBackoffDecay`, if this is disabled, once kube-apiserver
888+
is restarted it will serve `restartPolicy` fields set to `Rapid` as `Always`.
889+
856890
###### What happens if we reenable the feature if it was previously rolled back?
857891

892+
Both features can also be reenabled.
893+
894+
For `ReduceDefaultCrashLoopBackoffDecay`, if this is reenabled, once kubelet is
895+
restarted it will initialize the default backoff again to the new initial value
896+
of 1s, and all restart delays thereafter will be calculated against this
897+
equation.
898+
899+
For `EnableRapidCrashLoopBackoffDecay`, if this is disabled, once kube-apiserver
900+
is restarted it will serve `restartPolicy` fields set to `Rapid` again as
901+
`Rapid`, which kubelet will be able to interpret.
902+
858903
###### Are there any tests for feature enablement/disablement?
859904

860905
<!--
@@ -870,6 +915,15 @@ You can take a look at one potential example of such test in:
870915
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
871916
-->
872917

918+
Yes, this requires tests for
919+
* switching `ReduceDefaultCrashLoopBackoffDecay` on or off
920+
* switching `EnableRapidCrashLoopBackoffDecay` off when there are workloads with
921+
`restartPolicy: Rapid` set
922+
* switching `EnableRapidCrashLoopBackoffDecay` off when there are no workloads
923+
with `restartPolicy: Rapid` set
924+
* switching `EnableRapidCrashLoopBackoffDecay` off, and then back on again, when
925+
there are workloads with `restartPolicy: Rapid` set
926+
873927
### Rollout, Upgrade and Rollback Planning
874928

875929
<!--
@@ -1032,6 +1086,8 @@ Focusing mostly on:
10321086
heartbeats, leader election, etc.)
10331087
-->
10341088

1089+
It will not result in NEW API calls.
1090+
10351091
###### Will enabling / using this feature result in introducing new API types?
10361092

10371093
<!--
@@ -1041,6 +1097,8 @@ Describe them, providing:
10411097
- Supported number of objects per namespace (for namespace-scoped objects)
10421098
-->
10431099

1100+
No, this KEP will not result in any new API types.
1101+
10441102
###### Will enabling / using this feature result in any new calls to the cloud provider?
10451103

10461104
<!--
@@ -1049,6 +1107,8 @@ Describe them, providing:
10491107
- Estimated increase:
10501108
-->
10511109

1110+
No, this KEP will not result in any new calls to the cloud provider.
1111+
10521112
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
10531113

10541114
<!--
@@ -1058,6 +1118,8 @@ Describe them, providing:
10581118
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
10591119
-->
10601120

1121+
No, this KEP will not result in increasing size or count of the existing API objects.
1122+
10611123
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
10621124

10631125
<!--
@@ -1069,6 +1131,12 @@ Think about adding additional work or introducing new steps in between
10691131
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
10701132
-->
10711133

1134+
Maybe! As containers will be restarting more, this may affect "Startup latency
1135+
of schedulable stateless pods", "Startup latency of schedule stateful pods".
1136+
This is directly the type of SLI impact that a) the split between the default
1137+
behavior change and the `Rapid` opt in is trying to mitigate, and b) one of the
1138+
targets of the benchmarking period during alpha.
1139+
10721140
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
10731141

10741142
<!--
@@ -1081,6 +1149,11 @@ This through this both in small and large cases, again with respect to the
10811149
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
10821150
-->
10831151

1152+
Yes! We expect more CPU usage of kubelet as it processes more restarts. During
1153+
the alpha benchmarking period, we will be quantifying that amount in fully and
1154+
partially saturated nodes with both the new default backoff curve and the
1155+
`Rapid` backoff curve.
1156+
10841157
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
10851158

10861159
<!--
@@ -1093,6 +1166,9 @@ Are there any tests that were run/should be run to understand performance charac
10931166
and validate the declared limits?
10941167
-->
10951168

1169+
It's possible, and is why during this alpha period we must benchmark fully
1170+
saturated nodes with the most aggressive restart characteristics.
1171+
10961172
### Troubleshooting
10971173

10981174
<!--

keps/sig-node/4603-tune-crashloopbackoff/kep.yaml

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,14 @@ milestone:
3838
# The following PRR answers are required at alpha release
3939
# List the feature gate name and the components for which it must be enabled
4040
feature-gates:
41-
- name: TuneCrashloopBackoff
41+
- name: ReduceDefaultCrashLoopBackoffDecay
4242
components:
43-
# - kube-apiserver
44-
# - kube-controller-manager
43+
- kube-apiserver
44+
- kubelet
45+
- name: EnableRapidCrashLoopBackoffDecay
46+
components:
47+
- kube-apiserver
48+
- kubelet
4549
disable-supported: true
4650

4751
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)