Skip to content

Commit 9eacadb

Browse files
committed
Merge branch 'kep-4603-tune-crashloopbackoff-132-copy' into kep-4603-tune-crashloopbackoff-132
2 parents d25595c + 4b3835f commit 9eacadb

File tree

1 file changed

+36
-6
lines changed
  • keps/sig-node/4603-tune-crashloopbackoff

1 file changed

+36
-6
lines changed

keps/sig-node/4603-tune-crashloopbackoff/README.md

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,15 @@ CrashLoopBackOffBehavior of today vs the proposed minimum for per node
404404
configuration](./restarts-vs-elapsed-minimum-per-node.png "Per node minimum backoff
405405
curve allowed")
406406

407+
While the complete information is saved for [Design Details](#per-node-config),
408+
its expedient to see the exact config proposed here:
409+
410+
```
411+
apiVersion: kubelet.config.k8s.io/v1beta1
412+
kind: KubeletConfiguration
413+
crashloopbackoff:
414+
max: 4
415+
```
407416

408417
### Refactor and flat rate to 10 minutes for the backoff counter reset threshold
409418

@@ -718,17 +727,37 @@ based config and 2) configuration following the API specification of the
718727
`kubelet.config.k8s.io/v1beta1 KubeletConfiguration` Kind, which is passed to
719728
kubelet as a config file or, beta as of Kubernetes 1.30, a config directory
720729
([ref](https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/)).
730+
721731
Since this is a per-node configuration that likely will be set on a subset of
722732
nodes, or potentially even differently per node, it's important that it can be
723-
manipulated per node. By default `KubeletConfiguration` is intended to be shared
733+
manipulated per node. Expected use cases of this type of heterogeneity in
734+
configuration include
735+
736+
* Dedicated node pool for workloads that are expected to rapidly restart
737+
* Config aligned with node labels/pod affinity labels for workloads that are
738+
expected to rapidly restart
739+
* Machine size adjusted config
740+
741+
By default `KubeletConfiguration` is intended to be shared
724742
between nodes, but the beta feature for drop-in configuration files in a
725743
colocated config directory cirumvent this. In addition, `KubeletConfiguration`
726744
drops fields unrecognized by the current kubelet's schema, making it a good
727745
choice to circumvent compatibility issues with n-3 kubelets. While there is an
728746
argument that this could be better manipulated with a command-line flag, so
729-
lifecycle tooling that configures nodes can expose it more transparently, the
730-
advantages to backwards compatibility outweigh this consideration for the alpha
731-
period and will be revisted before beta.
747+
lifecycle tooling that configures nodes can expose it more transparently, that
748+
was an acceptable design change given the introduction of `KubeletConfiguration`
749+
in the first place. In any case, the advantages to backwards and forward
750+
compatibility by far outweigh this consideration for the alpha period and can be
751+
revisted before beta.
752+
753+
The proposed configuration explicitly looks like this:
754+
755+
```
756+
apiVersion: kubelet.config.k8s.io/v1beta1
757+
kind: KubeletConfiguration
758+
crashloopbackoff:
759+
max: 4
760+
```
732761

733762
### Refactor of recovery threshold
734763

@@ -1147,6 +1176,7 @@ feature gates set as per the [Conflict Resolution](#conflict-resolution) policy
11471176

11481177
- Gather feedback from developers and surveys
11491178
- High confidence in the specific numbers/decay rate
1179+
- Including revisiting 300s maximum for node specific config
11501180
- Benchmark restart load methodology and analysis published and discussed with
11511181
SIG-Node
11521182
- Discuss PLEG polling loops and its effect on specific decay rates
@@ -1500,15 +1530,15 @@ which is a new field in the `KubeletConfiguration` Kind. Based on manual tests
15001530
by the author, adding an unknown field to `KubeletConfiguration` is safe and the
15011531
unknown config field is dropped before addition to the
15021532
`kube-system/kubelet-config` object which is its final destination (for example,
1503-
in the case of n-3 kubelets facing a configuration introduced by this KEP). This
1533+
in the case of n-3 kubelets facing a configuration introduced by this KEP). Ultimately this is supported by the configuratinon of a given Kind's `fieldValidation` strategy in API machinery ([ref](https://github.com/kubernetes/kubernetes/blob/release-1.31/staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/types.go#L584)) which, in 1.31+, is set to "warn" by default and is only valid for API objects and it turns out is not explicitly set as `strict` for `KuberntesConfiguration` object so they ultimately bypass this ([ref](https://github.com/kubernetes/kubectl/issues/1663#issuecomment-2392453716)). This
15041534
is not currently tested as far as I can tell in the tests for
15051535
`KubeletConfiguration` (in either the most likely location, in
15061536
[validation_test](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/apis/config/validation/validation_test.go),
15071537
nor other tests in the [config
15081538
package](https://github.com/kubernetes/kubernetes/tree/005f184ab631e52195ed6d129969ff3914d51c98/pkg/kubelet/apis/config))
15091539
and discussions with other contributors indicate that while little in core
15101540
kubernetes does strict parsing, it's not well tested. At minimum as part of this
1511-
implementation a test covering this for `KubeletConfgiuration` objects will be
1541+
implementation a test covering this for `KubeletConfiguration` objects will be
15121542
included in the `config.validation_test` package.
15131543

15141544
### Rollout, Upgrade and Rollback Planning

0 commit comments

Comments
 (0)