@@ -1066,6 +1066,44 @@ enhancement:
1066
1066
CRI or CNI may require updating that component before the kubelet.
1067
1067
-->
1068
1068
1069
+ For the default backoff curve, no coordination must be done between the control
1070
+ plane and the nodes; all behavior changes are local to the kubelet component and
1071
+ its start up configuration.
1072
+
1073
+ For the `Rapid` case, the API server and kubelets across all nodes must be
1074
+ coordinated :
1075
+
1076
+ Firstly, `Rapid` must be a valid option to the `restartPolicy` in the API
1077
+ server, which will only be possible if the API server is updated to 1.31.
1078
+
1079
+ Secondly, the `Rapid` value must be interpretable by all kubelets on every node.
1080
+ Unfortunately, it is not possible for the API server to be aware of what version
1081
+ each kubelet is on, so it cannot serve `Rapid` as `Always` preferentially to
1082
+ each kubelet depending on its version. Instead, each kubelet must be able to
1083
+ handle this value properly, both at n-3 kubelet version and -- more easily -- at
1084
+ its contemporary 1.31+ kubelet version. At 1.31+ kubelet version, each kubelet
1085
+ will be able to detect if it has the feature gate on, and if so, interpret
1086
+ ` Rapid` to use the new rapid backoff curve; and if the feature gate is off,
1087
+ interpret it instead as `Always`. But at earlier kubelet versions, `Rapid` must
1088
+ be ignored in favor of `Always`. Unfortunately for this KEP, the default value
1089
+ for `restartPolicy` is Never, so if kubelet drops unexpected enum values for
1090
+ ` restartPolicy` , a Pod with `Rapid` will be misconfigured by an old kubelet to
1091
+ never restart.
1092
+
1093
+ There are two options to deal with this version skew issue :
1094
+
1095
+ 1. Follow the restrictions
1096
+ [here](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#new-enum-value-in-existing-field),
1097
+ implement the fallback to `Always` in kubelet, and wait 3 releases before the
1098
+ API server is allowed to serve the new enum value to honor the n-3 kubelet
1099
+ restriction, or
1100
+ 2. introduce the `Rapid` policy as a different field instead, for example,
1101
+ `backoffCurve : Rapid` or even more transiently, `alphaBackoffCurve: Rapid`.
1102
+ Even though the Pod API is already at v1, adding fields to GA APIs is
1103
+ [allowed](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api_changes.md#adding-a-field)
1104
+ without changing the API version. As this is an alpha field, it can be
1105
+ deprecated later.
1106
+
1069
1107
# # Production Readiness Review Questionnaire
1070
1108
1071
1109
<!--
0 commit comments