@@ -177,19 +177,18 @@ will take precedence over cgroupDriver setting from the kubelet config (or
177
177
` --cgroup-driver ` command line flag). If the runtime does not provide
178
178
information about the cgroup driver, then kubelet will fall back to using its
179
179
own configuration (` cgroupDriver ` from kubeletConfig or the ` --cgroup-driver `
180
- flag). For beta, the kubeletConfig field and ` --cgroup-driver ` flag will be
181
- marked as deprecated. Usage of the deprecated setting will produce a log
182
- message, e.g.:
180
+ flag). In beta, resorting to the fallback behavior will produce a log message like:
183
181
184
182
```
185
183
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection.
186
184
```
187
185
188
186
The ` --cgroup-driver ` flag and the cgroupDriver configuration option will be
189
- deprecated and have no effect when support for the feature is graduated to GA.
190
- The kubelet refuses to start if the CRI runtime does not support the feature.
191
- The configurations flags will ultimately be removed as per the
192
- [ Kubernetes deprecation policy] ( https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli ) .
187
+ deprecated when support for the feature is graduated to GA.
188
+ The configurations flags (and the related fallback behavior) will be removed in
189
+ a later release as per the [ Kubernetes deprecation policy] [ deprecation-policy ] .
190
+ At the point the kubelet refuses to start if the CRI runtime does not support
191
+ the feature.
193
192
194
193
Kubelet startup is modified so that connection to the CRI server (container
195
194
runtime) is established and RuntimeConfig is queried before initializing the
@@ -199,6 +198,8 @@ succeed, an error (error response or timeout) is regarded as a failed
199
198
initialization of the runtime service and kubelet will exit with an error
200
199
message and an error code.
201
200
201
+ [ deprecation-policy ] : https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli
202
+
202
203
### Test Plan
203
204
204
205
[ x] I/we understand the owners of the involved components may require updates to
@@ -242,20 +243,21 @@ No new e2e tests for kubelet are planned.
242
243
#### GA
243
244
244
245
- No bugs reported in the previous cycle.
245
- - Drop fallback to old behavior. CRI implementations expected to have support.
246
246
- Deprecate kubelet cgroupDriver configuration option and ` --cgroup-driver ` flag.
247
247
- Remove feature gate
248
248
- All issues and gaps identified as feedback during beta are resolved
249
249
250
250
### Upgrade / Downgrade Strategy
251
251
252
- In alpha and beta, the fallback behavior specified in alpha will prevent the majority of regressions, as Kubelet will choose a cgroup driver,
253
- same as it used to before this KEP, even when the feature gate is on.
252
+ The fallback behavior will prevent the majority of regressions, as Kubelet will
253
+ choose a cgroup driver, same as it used to before this KEP, even when the
254
+ feature gate is on.
255
+
254
256
The feature gate is another layer of protection, requiring admins to specifically opt-into this behavior.
255
257
256
258
### Version Skew Strategy
257
259
258
- In alpha and beta, if either kubelet or the container runtime running on the node does not support
260
+ If either kubelet or the container runtime running on the node does not support
259
261
the new field in the CRI API, they just resort to the existing behavior of
260
262
respecting their individual cgroup-driver setting. That is, if the node has a
261
263
container runtime that does not support this field the kubelet will use its
@@ -266,7 +268,10 @@ ignored by kubelet and it will resort to its own configuration settings. Note:
266
268
this does present a configuration skew risk, but that risk is the same as
267
269
currently exists today.
268
270
269
- In GA, the fallback behavior will be removed, and the kubelet will rely on the
271
+ The fallback behavior will be removed along with the ` --cgroup-driver ` flag and
272
+ cgroupDriver option in a few releases after GA, as per the
273
+ [ Kubernetes deprecation policy] [ deprecation-policy ] .
274
+ At this point the kubelet relies on the
270
275
container runtime to implement the feature. In practice, this means the cluster
271
276
must use at least containerd v2.0 or cri-o v1.28 as a prerequisite for
272
277
upgrading.
@@ -285,15 +290,15 @@ upgrading.
285
290
286
291
Yes.
287
292
288
- In alpha and beta, when the runtime is updated to a version that supports this, kubelet
293
+ When the runtime is updated to a version that supports this, kubelet
289
294
will ignore the cgroupDriver config option/flag. However, this change in
290
295
behavior should not cause any breakages (on the contrary, it should fix
291
296
scenarios where the kubelet ` --cgroup-driver ` setting is incorrectly
292
297
configured). With old versions of the container runtimes (that don't support
293
298
the new field in the CRI API) the default behavior is not changed.
294
299
295
- In GA, the fallback behavior is removed and the kubelet requires the CRI
296
- runtime to implement the feature (see
300
+ When the ` --cgroup-driver ` setting is removed, the fallback behavior is dropped
301
+ and the kubelet requires the CRI runtime to implement the feature (see
297
302
[ Version Skew Strategy] ( #version-skew-strategy ) ).
298
303
299
304
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
@@ -334,14 +339,12 @@ testing of the feature gate (in addition to the unit tests) is performed.
334
339
335
340
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
336
341
337
- Yes.
338
-
339
- In alpha and beta, the CgroupDriver field of the Kubelet configuration (and the
342
+ Yes, the CgroupDriver field of the Kubelet configuration (and the
340
343
corresponding ` --cgroup-driver ` flag) will be marked as deprecated.
341
344
342
- In GA, the CgroupDriver configuration option and the ` --cgroup-driver ` flag are
343
- deprecated, will have no effect, and will be removed in a future release as per
344
- the Kubernetes deprecation policy.
345
+ After GA, the CgroupDriver configuration option and the ` --cgroup-driver ` flag
346
+ will be removed in a future release as per the
347
+ [ Kubernetes deprecation policy] [ deprecation-policy ]
345
348
346
349
### Monitoring Requirements
347
350
@@ -354,14 +357,16 @@ info`).
354
357
355
358
###### How can someone using this feature know that it is working for their instance?
356
359
357
- No metrics will expose this.
358
-
359
- In alpha and beta, examining kubelet logs whould inform the
360
+ No metrics will expose this. Examining kubelet logs whould inform
360
361
that the cgroup driver setting instructed by the runtime is being used.
361
362
362
- In GA, the kubelet refuses to start if the feature is not working. The can be
363
- observed in the system logs and the node being in NotReady state or node not
364
- being registered in cluster bootstrap.
363
+ After GA, the CgroupDriver configuration option and the ` --cgroup-driver ` flag
364
+ will be removed in a future release, in accordance with the
365
+ [ Kubernetes deprecation policy] [ deprecation-policy ] . At that point, the kubelet
366
+ will refuse to start if the required feature is not functioning correctly. This
367
+ failure can be observed in system logs, with the node either entering a
368
+ NotReady state or failing to register during cluster bootstrap. The behavior
369
+ will be similar to other critical CRI server errors.
365
370
366
371
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
367
372
@@ -379,12 +384,12 @@ N/A.
379
384
380
385
###### Does this feature depend on any specific services running in the cluster?
381
386
382
- A CRI (server) implementation of the correct version.
383
-
384
- In alpha and beta, the feature will fallback if the CRI implementation doesn’t
385
- support the feature.
387
+ A CRI (server) implementation of the correct version. However, the feature will
388
+ fallback if the CRI implementation doesn’t support the feature.
386
389
387
- In GA, a sufficiently recent version of the CRI runtime is a hard requirement.
390
+ After GA, the fallback behavior will be removed in a future release, as per the
391
+ [ Kubernetes deprecation policy] [ deprecation-policy ] . At this point, a
392
+ sufficiently recent version of the CRI runtime is a hard requirement.
388
393
389
394
### Scalability
390
395
@@ -430,10 +435,12 @@ container runtime, only.
430
435
431
436
###### What are other known failure modes?
432
437
433
- In alpha and beta, same that exists today: Kubelet and the CRI server (container runtime) not
438
+ Same that exists today: Kubelet and the CRI server (container runtime) not
434
439
agreeing on the CgroupDriver while one of them doesn’t support the feature.
435
440
436
- In GA, the kubelet requires the CRI runtime to implement the feature and will
441
+ After GA, the fallback behavior will be removed in a future release, as per the
442
+ [ Kubernetes deprecation policy] [ deprecation-policy ] . At this point,
443
+ the kubelet requires the CRI runtime to implement the feature and will
437
444
refuse to start if it is not supported. As a result, the minimum required
438
445
versions for containerd is v2.0 and for cri-o is v1.28.
439
446
0 commit comments