Skip to content

Commit 005428d

Browse files
committed
Add why not for the Rapid case
Signed-off-by: Laura Lorenz <[email protected]>
1 parent 5dfcba1 commit 005428d

File tree

1 file changed

+39
-6
lines changed
  • keps/sig-node/4603-tune-crashloopbackoff

1 file changed

+39
-6
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 39 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1154,16 +1154,52 @@ to revisiting the CrashLoopBackoff behaviors for common use cases:
11541154

11551155
For step (2), the method to allow the Pods to opt-in was by a new enum value,
11561156
`Rapid`, for a Pod's `RestartPolicy`. In this case, Pods and restartable init
1157-
(aka sidecar) containers will be able to set a new OneOf value, `restartPolicy:
1157+
(aka sidecar) containers would be able to set a new OneOf value, `restartPolicy:
11581158
Rapid`, to opt in to an exponential backoff decay that starts at a lower initial
1159-
value and maximizes to a lower cap. This proposal suggests we start with a new
1159+
value and maximizes to a lower cap. This proposal suggested we start with a new
11601160
initial value of 250ms and cap of 1 minute, and analyze its impact on
11611161
infrastructure during alpha.
11621162

11631163
!["A graph showing today's decay curve against a curve with an initial value of
11641164
250ms and a cap of 1 minute for a workload failing every 10 s"](todayvsrapid.png
11651165
"rapid vs todays' CrashLoopBackoff")
11661166

1167+
**Why not?**: There was still a general community consensus that even though
1168+
this was opt-in, giving the power to reduce the backoff curve to users in
1169+
control of the pod manifest -- who as a persona are not necessarily users with
1170+
cluster-wide or at least node-wide visibility into load and scheduling -- was
1171+
too risky to global node stability.
1172+
1173+
In addition, overriding an existing Pod spec
1174+
enum value, while convenient, required detailed management of the version skew
1175+
period, at minimum across 3 kubelet versions per the [API policy for new enum values in existing fields](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#new-enum-value-in-existing-field). In practice
1176+
this meant the API server and kubelets across all nodes must be coordinated.
1177+
1178+
Firstly, `Rapid` must be a valid option to the `restartPolicy` in the API server
1179+
(which would only be possible if/when the API server was updated), and secondly,
1180+
the `Rapid` value must be interpretable by all kubelets on every node.
1181+
Unfortunately, it is not possible for the API server to be aware of what version
1182+
each kubelet is on, so it cannot serve `Rapid` as `Always` preferentially to
1183+
each kubelet depending on its version. Instead, each kubelet must be able to
1184+
handle this value properly, both at n-3 kubelet version and -- more easily -- at
1185+
its contemporary kubelet version. For updated kubelet versions, each kubelet
1186+
would be able to detect if it has the feature gate on, and if so, interpret
1187+
`Rapid` to use the new rapid backoff curve; and if the feature gate is off,
1188+
interpret it instead as `Always`. But at earlier kubelet versions, `Rapid` must
1189+
be ignored in favor of `Always`. Unfortunately for this KEP, the default value
1190+
for `restartPolicy` is Never, though even more unfortunately, it looks like
1191+
different code paths use a different default value (thank you
1192+
[@tallclair](https://github.com/tallclair)!!;
1193+
[1](https://github.com/kubernetes/kubernetes/blob/a7ca13ea29ba5b3c91fd293cdbaec8fb5b30cee2/pkg/kubelet/container/helpers.go#L105)
1194+
defaults to `Always`,
1195+
[2](https://github.com/kubernetes/kubernetes/blob/a7ca13ea29ba5b3c91fd293cdbaec8fb5b30cee2/pkg/kubelet/kubelet_pods.go#L1713)
1196+
defaults to `OnFailure`,
1197+
[3](https://github.com/kubernetes/kubernetes/blob/a7ca13ea29ba5b3c91fd293cdbaec8fb5b30cee2/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L838-L859)
1198+
defaults to `Always`, and
1199+
[4](https://github.com/kubernetes/kubernetes/blob/a7ca13ea29ba5b3c91fd293cdbaec8fb5b30cee2/pkg/kubelet/status/status_manager.go#L554-L573)
1200+
defaults to `Never`), so if kubelet drops unexpected enum values for
1201+
`restartPolicy`, a Pod with `Rapid` will be misconfigured by an old kubelet.
1202+
11671203
### Flat-rate restarts for `Succeeded` Pods
11681204

11691205
We start from the assumption that the "Succeeded" phase of a Pod in Kubernetes
@@ -1262,10 +1298,7 @@ to
12621298
12631299
The author is aware that these solutions still do not address use cases where
12641300
users have taken advantage of the "cleaner" state "guarantees" of a restarted
1265-
pod to alleviate security or privacy concerns between sequenced Pod runs. In
1266-
these cases, during alpha, it is recommended to take advantage of the
1267-
`restartPolicy: Rapid` option, with expectations that on further infrastructure
1268-
analysis this behavior may become even faster.
1301+
pod to alleviate security or privacy concerns between sequenced Pod runs.
12691302
12701303
This decision here does not disallow the possibility that this is solved in
12711304
other ways, for example:

0 commit comments

Comments
 (0)