Skip to content

Commit 9d3daed

Browse files
committed
Version Skew update
Signed-off-by: lauralorenz <[email protected]>
1 parent 764e174 commit 9d3daed

File tree

1 file changed

+38
-0
lines changed
  • keps/sig-node/4603-tune-crashloopbackoff

1 file changed

+38
-0
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1066,6 +1066,44 @@ enhancement:
10661066
CRI or CNI may require updating that component before the kubelet.
10671067
-->
10681068

1069+
For the default backoff curve, no coordination must be done between the control
1070+
plane and the nodes; all behavior changes are local to the kubelet component and
1071+
its start up configuration.
1072+
1073+
For the `Rapid` case, the API server and kubelets across all nodes must be
1074+
coordinated:
1075+
1076+
Firstly, `Rapid` must be a valid option to the `restartPolicy` in the API
1077+
server, which will only be possible if the API server is updated to 1.31.
1078+
1079+
Secondly, the `Rapid` value must be interpretable by all kubelets on every node.
1080+
Unfortunately, it is not possible for the API server to be aware of what version
1081+
each kubelet is on, so it cannot serve `Rapid` as `Always` preferentially to
1082+
each kubelet depending on its version. Instead, each kubelet must be able to
1083+
handle this value properly, both at n-3 kubelet version and -- more easily -- at
1084+
its contemporary 1.31+ kubelet version. At 1.31+ kubelet version, each kubelet
1085+
will be able to detect if it has the feature gate on, and if so, interpret
1086+
`Rapid` to use the new rapid backoff curve; and if the feature gate is off,
1087+
interpret it instead as `Always`. But at earlier kubelet versions, `Rapid` must
1088+
be ignored in favor of `Always`. Unfortunately for this KEP, the default value
1089+
for `restartPolicy` is Never, so if kubelet drops unexpected enum values for
1090+
`restartPolicy`, a Pod with `Rapid` will be misconfigured by an old kubelet to
1091+
never restart.
1092+
1093+
There are two options to deal with this version skew issue:
1094+
1095+
1. Follow the restrictions
1096+
[here](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#new-enum-value-in-existing-field),
1097+
implement the fallback to `Always` in kubelet, and wait 3 releases before the
1098+
API server is allowed to serve the new enum value to honor the n-3 kubelet
1099+
restriction, or
1100+
2. introduce the `Rapid` policy as a different field instead, for example,
1101+
`backoffCurve: Rapid` or even more transiently, `alphaBackoffCurve: Rapid`.
1102+
Even though the Pod API is already at v1, adding fields to GA APIs is
1103+
[allowed](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#adding-a-field)
1104+
without changing the API version. As this is an alpha field, it can be
1105+
deprecated later.
1106+
10691107
## Production Readiness Review Questionnaire
10701108

10711109
<!--

0 commit comments

Comments
 (0)