Skip to content

Commit 385894f

Browse files
committed
Move some things around so proposal is easier to read
Signed-off-by: Laura Lorenz <[email protected]>
1 parent a443965 commit 385894f

File tree

2 files changed

+75
-51
lines changed

2 files changed

+75
-51
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 75 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -6,23 +6,23 @@ To get started with this template:
66
- [x] **Pick a hosting SIG.**
77
Make sure that the problem space is something the SIG is interested in taking
88
up. KEPs should not be checked in without a sponsoring SIG.
9-
- [ ] **Create an issue in kubernetes/enhancements**
9+
- [x] **Create an issue in kubernetes/enhancements**
1010
When filing an enhancement tracking issue, please make sure to complete all
1111
fields in that template. One of the fields asks for a link to the KEP. You
1212
can leave that blank until this KEP is filed, and then go back to the
1313
enhancement and add the link.
14-
- [ ] **Make a copy of this template directory.**
14+
- [x] **Make a copy of this template directory.**
1515
Copy this template into the owning SIG's directory and name it
1616
`NNNN-short-descriptive-title`, where `NNNN` is the issue number (with no
1717
leading-zero padding) assigned to your enhancement above.
18-
- [ ] **Fill out as much of the kep.yaml file as you can.**
18+
- [x] **Fill out as much of the kep.yaml file as you can.**
1919
At minimum, you should fill in the "Title", "Authors", "Owning-sig",
2020
"Status", and date-related fields.
21-
- [ ] **Fill out this file as best you can.**
21+
- [x] **Fill out this file as best you can.**
2222
At minimum, you should fill in the "Summary" and "Motivation" sections.
2323
These should be easy if you've preflighted the idea of the KEP with the
2424
appropriate SIG(s).
25-
- [ ] **Create a PR for this KEP.**
25+
- [x] **Create a PR for this KEP.**
2626
Assign it to people in the SIG who are sponsoring this process.
2727
- [ ] **Merge early and iterate.**
2828
Avoid getting hung up on specific details and instead aim to get the goals of
@@ -151,14 +151,14 @@ checklist items _must_ be updated for the enhancement to be released.
151151

152152
Items marked with (R) are required *prior to targeting to a milestone / release*.
153153

154-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
154+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
155155
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
156156
- [ ] (R) Design details are appropriately documented
157157
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
158158
- [ ] e2e Tests for all Beta API Operations (endpoints)
159159
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
160160
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
161-
- [ ] (R) Graduation criteria is in place
161+
- [x] (R) Graduation criteria is in place
162162
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
163163
- [ ] (R) Production readiness review completed
164164
- [ ] (R) Production readiness review approved
@@ -348,55 +348,22 @@ these changes.
348348

349349
#### Existing backoff curve change: front loaded decay
350350

351-
As mentioned above, today the standard backoff curve is an exponential decay
352-
starting at 10s and capping at 5 minutes, resulting in a composite of the
353-
standard hockey-stick exponential decay graph followed by a linear rise until
354-
the heat death of the universe as depicted below:
355-
356-
![A graph showing the backoff decay for a Kubernetes pod in
357-
CrashLoopBackoff](./crashloopbackoff-succeedingcontainer.png "CrashLoopBackoff
358-
decay")
359-
360-
Remember that the backoff counter is reset if containers run longer than 10
361-
minutes, so in the worst case where a container always exits after 9:59:59, this
362-
means in the first 30 minute period, the container will restart twice. In a more
363-
easily digestible example used in models below, for a fast exiting container
364-
crashing every 10 seconds, in the first 30 minutes the container will restart
365-
about 10 times, with the first four restarts in the first 5 minutes.
366-
367351
This KEP proposes changing the existing backoff curve to load more restarts
368352
earlier by changing the initial value of the exponential backoff. A number of
369353
alternate initial values are modelled below, until the 5 minute cap would be
370354
reached. This proposal suggests we start with a new initial value of 1s, and
371355
analyze its impact on infrastructure during alpha.
372356

373-
!["A graph showing the decay curves for different initial values"](differentinitialvalues.png
374-
"Alternate CrashLoopBackoff initial values")
357+
![](todayvs1sbackoff.png)
375358

376359

377360
#### API opt in for max cap decay curve (`restartPolicy: Rapid`)
378361

379-
For some users in
380-
[Kubernetes#57291](https://github.com/kubernetes/kubernetes/issues/57291), any
381-
delay over 1 minute at any point is just too slow, even if it is legitimately
382-
crashing. A common refrain is that for independently recoverable errors,
383-
especially system infrastructure events or recovered external dependencies, or
384-
for absolutely nonnegotiably critical sidecar pods, users would rather poll more
385-
often or more intelligently to reduce the amount of time a workload has to wait
386-
to try again after a failure. In the extreme cases, users want to be able to
387-
configure (by container, node, or exit code) the backoff to close to 0 seconds.
388-
This KEP considers it out of scope to implement fully user-customizable
389-
behavior, and too risky without full and complete benchmarking to node stability
390-
to allow legitimately crashing workloads to have a backoff of 0, but it is in
391-
scope for the first alpha to provide users a way to opt workloads in to a even
392-
faster restart behavior.
393-
394362
Pods and restartable init (aka sidecar) containers will be able to set a new
395363
OneOf value, `restartPolicy: Rapid`, to opt in to an exponential backoff decay
396-
that starts at a low initial value and maximizes to a cap of 1 minute. The
397-
detailed methodology for determining the implementable starting value, and
398-
benchmarking it during and after alpha, is enclosed in Design Details, but will
399-
start at 250ms.
364+
that starts at a lower initial value and maximizes to a lower cap. This proposal
365+
suggests we start with a new initial value of 250ms and cap of 1 minute, and
366+
analyze its impact on infrastructure during alpha.
400367

401368
!["A graph showing today's decay curve against a curve with an initial value of
402369
250ms and a cap of 1 minute for a workload failing every 10 s"](todayvsrapid.png
@@ -497,6 +464,22 @@ proposal will be implemented, this is the place to discuss them.
497464
-->
498465

499466
### Front loaded decay curve methodology
467+
As mentioned above, today the standard backoff curve is an exponential decay
468+
starting at 10s and capping at 5 minutes, resulting in a composite of the
469+
standard hockey-stick exponential decay graph followed by a linear rise until
470+
the heat death of the universe as depicted below:
471+
472+
![A graph showing the backoff decay for a Kubernetes pod in
473+
CrashLoopBackoff](./crashloopbackoff-succeedingcontainer.png "CrashLoopBackoff
474+
decay")
475+
476+
Remember that the backoff counter is reset if containers run longer than 10
477+
minutes, so in the worst case where a container always exits after 9:59:59, this
478+
means in the first 30 minute period, the container will restart twice. In a more
479+
easily digestible example used in models below, for a fast exiting container
480+
crashing every 10 seconds, in the first 30 minutes the container will restart
481+
about 10 times, with the first four restarts in the first 5 minutes.
482+
500483
Why change the initial value of the backoff curve instead of its rate, or why
501484
not change the decay function entirely to other well known equations (like
502485
functions resulting in curves that are lienar, parabolic, sinusoidal, etc)?
@@ -513,7 +496,12 @@ is unrecoverable failures causing "runaway" containers to overload kubelet.
513496

514497
To determine the effect in abstract of changing the initial value on current
515498
behavior, we modeled the change in the starting value of the decay from 10s to
516-
1s, 250ms, or even 25ms. For today's decay rate, the first restart is within the
499+
1s, 250ms, or even 25ms.
500+
501+
!["A graph showing the decay curves for different initial values"](differentinitialvalues.png
502+
"Alternate CrashLoopBackoff initial values")
503+
504+
For today's decay rate, the first restart is within the
517505
first 10s, the second within the first 30s, the third within the first 70s.
518506
Using those same time windows to compare alternate initial values, for example
519507
changing the initial rate to 1s, we would instead have 3 restarts in the first
@@ -523,19 +511,55 @@ earlier, but even at 250ms or 25ms initial values, each approach a similar rate
523511
of restarts after the third time window.
524512

525513
![A graph showing different exponential backoff decays for initial values of
526-
10s, 1s, 250ms and 25ms](initialvaluesandnumberofrestarts.png' "Changes to decay
514+
10s, 1s, 250ms and 25ms](initialvaluesandnumberofrestarts.png "Changes to decay
527515
with different initial values")
528516

529517
Among these modeled initial values, we would get between 3-7 excess restarts per
530518
backoff lifetime, mostly within the first three time windows matching today's
531519
restart behavior.
532520

533-
#### New OneOf for `restartPolicy` -- `Rapid`
534-
`restartPolicy` is an immutable field in podSpec and containerSpec. If set in podSpec, each container in the Pod inherits the Pod's restart policy of either `Never` (default), `OnFailure`, or `Always`; for a Job, the only valid options are `Never` and `OnFailure`. In containerSpec, it is valid ONLY on init containers and ONLY as `Always`, to configure a sidecar container that runs continuously alongside the regular containers in the Pod.
521+
#### Rapid curve methodology
522+
523+
For some users in
524+
[Kubernetes#57291](https://github.com/kubernetes/kubernetes/issues/57291), any
525+
delay over 1 minute at any point is just too slow, even if it is legitimately
526+
crashing. A common refrain is that for independently recoverable errors,
527+
especially system infrastructure events or recovered external dependencies, or
528+
for absolutely nonnegotiably critical sidecar pods, users would rather poll more
529+
often or more intelligently to reduce the amount of time a workload has to wait
530+
to try again after a failure. In the extreme cases, users want to be able to
531+
configure (by container, node, or exit code) the backoff to close to 0 seconds.
532+
This KEP considers it out of scope to implement fully user-customizable
533+
behavior, and too risky without full and complete benchmarking to node stability
534+
to allow legitimately crashing workloads to have a backoff of 0, but it is in
535+
scope for the first alpha to provide users a way to opt workloads in to a even
536+
faster restart behavior.
535537

536-
This KEP will support a new value for this field, `Rapid`, which on feature flag disablement will be interpreted as `Always`. If `restartPolicy: Rapid` is set or inherited for a container, that container will follow the new Rapid backoff curve.
538+
The finalization of the initial and max cap after benchmarking. As a
539+
conservative first estimate in line with maximums discussed on
540+
[Kubernetes#57291](https://github.com/kubernetes/kubernetes/issues/57291), the
541+
initial curve is selected at initial=250ms / cap=1 minute, but during
542+
benchmarking this will be modelled against kubelet capacity, potentially
543+
targeting something closer to an initial value near 0s, and a cap of 10-30s.
537544

538-
Due to configuring this as another option to this field, this would make Rapid backoff possible for restartable init (aka sidecar) containers, Pods, Deployments, StatefulSets, ReplicaSets, DaemonSets, but NOT pure init containers, Jobs or CronJobs.
545+
546+
#### New OneOf for `restartPolicy` -- `Rapid`
547+
`restartPolicy` is an immutable field in podSpec and containerSpec. If set in
548+
podSpec, each container in the Pod inherits the Pod's restart policy of either
549+
`Never` (default), `OnFailure`, or `Always`; for a Job, the only valid options
550+
are `Never` and `OnFailure`. In containerSpec, it is valid ONLY on init
551+
containers and ONLY as `Always`, to configure a sidecar container that runs
552+
continuously alongside the regular containers in the Pod.
553+
554+
This KEP will support a new value for this field, `Rapid`, which on feature flag
555+
disablement will be interpreted as `Always`. If `restartPolicy: Rapid` is set or
556+
inherited for a container, that container will follow the new Rapid backoff
557+
curve.
558+
559+
Due to configuring this as another option to this field, this would make Rapid
560+
backoff possible for restartable init (aka sidecar) containers, Pods,
561+
Deployments, StatefulSets, ReplicaSets, DaemonSets, but NOT pure init
562+
containers, Jobs or CronJobs.
539563

540564
### Kubelet overhead analysis
541565

@@ -555,7 +579,7 @@ does during pod restarts.
555579
* Logs information about all those container operations (utilizing disk IO and
556580
“spamming” logs)
557581

558-
#### Observability
582+
### Observability
559583

560584
Again, let it be known that by definition this KEP will cause pods to restart
561585
faster and more often than the current status quo and such a change is desired.
12.7 KB
Loading

0 commit comments

Comments
 (0)