Skip to content

Commit 5c040c3

Browse files
committed
KEP-4033: update GA graduation
Propose to retain fallback behavior in GA, but, mark the the --cgroup-driver flag and cgroupDriver config option as deprecated and eventually drop the according to the Kubernetes deprecation policy.
1 parent a0ac8f1 commit 5c040c3

File tree

1 file changed

+41
-34
lines changed
  • keps/sig-node/4033-group-driver-detection-over-cri

1 file changed

+41
-34
lines changed

keps/sig-node/4033-group-driver-detection-over-cri/README.md

Lines changed: 41 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -177,19 +177,18 @@ will take precedence over cgroupDriver setting from the kubelet config (or
177177
`--cgroup-driver` command line flag). If the runtime does not provide
178178
information about the cgroup driver, then kubelet will fall back to using its
179179
own configuration (`cgroupDriver` from kubeletConfig or the `--cgroup-driver`
180-
flag). For beta, the kubeletConfig field and `--cgroup-driver` flag will be
181-
marked as deprecated. Usage of the deprecated setting will produce a log
182-
message, e.g.:
180+
flag). In beta, resorting to the fallback behavior will produce a log message like:
183181

184182
```
185183
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection.
186184
```
187185

188186
The `--cgroup-driver` flag and the cgroupDriver configuration option will be
189-
deprecated and have no effect when support for the feature is graduated to GA.
190-
The kubelet refuses to start if the CRI runtime does not support the feature.
191-
The configurations flags will ultimately be removed as per the
192-
[Kubernetes deprecation policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli).
187+
deprecated when support for the feature is graduated to GA.
188+
The configurations flags (and the related fallback behavior) will be removed in
189+
a later release as per the [Kubernetes deprecation policy][deprecation-policy].
190+
At the point the kubelet refuses to start if the CRI runtime does not support
191+
the feature.
193192

194193
Kubelet startup is modified so that connection to the CRI server (container
195194
runtime) is established and RuntimeConfig is queried before initializing the
@@ -199,6 +198,8 @@ succeed, an error (error response or timeout) is regarded as a failed
199198
initialization of the runtime service and kubelet will exit with an error
200199
message and an error code.
201200

201+
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-flag-or-cli
202+
202203
### Test Plan
203204

204205
[x] I/we understand the owners of the involved components may require updates to
@@ -242,20 +243,21 @@ No new e2e tests for kubelet are planned.
242243
#### GA
243244

244245
- No bugs reported in the previous cycle.
245-
- Drop fallback to old behavior. CRI implementations expected to have support.
246246
- Deprecate kubelet cgroupDriver configuration option and `--cgroup-driver` flag.
247247
- Remove feature gate
248248
- All issues and gaps identified as feedback during beta are resolved
249249

250250
### Upgrade / Downgrade Strategy
251251

252-
In alpha and beta, the fallback behavior specified in alpha will prevent the majority of regressions, as Kubelet will choose a cgroup driver,
253-
same as it used to before this KEP, even when the feature gate is on.
252+
The fallback behavior will prevent the majority of regressions, as Kubelet will
253+
choose a cgroup driver, same as it used to before this KEP, even when the
254+
feature gate is on.
255+
254256
The feature gate is another layer of protection, requiring admins to specifically opt-into this behavior.
255257

256258
### Version Skew Strategy
257259

258-
In alpha and beta, if either kubelet or the container runtime running on the node does not support
260+
If either kubelet or the container runtime running on the node does not support
259261
the new field in the CRI API, they just resort to the existing behavior of
260262
respecting their individual cgroup-driver setting. That is, if the node has a
261263
container runtime that does not support this field the kubelet will use its
@@ -266,7 +268,10 @@ ignored by kubelet and it will resort to its own configuration settings. Note:
266268
this does present a configuration skew risk, but that risk is the same as
267269
currently exists today.
268270

269-
In GA, the fallback behavior will be removed, and the kubelet will rely on the
271+
The fallback behavior will be removed along with the `--cgroup-driver` flag and
272+
cgroupDriver option in a few releases after GA, as per the
273+
[Kubernetes deprecation policy][deprecation-policy].
274+
At this point the kubelet relies on the
270275
container runtime to implement the feature. In practice, this means the cluster
271276
must use at least containerd v2.0 or cri-o v1.28 as a prerequisite for
272277
upgrading.
@@ -285,15 +290,15 @@ upgrading.
285290

286291
Yes.
287292

288-
In alpha and beta, when the runtime is updated to a version that supports this, kubelet
293+
When the runtime is updated to a version that supports this, kubelet
289294
will ignore the cgroupDriver config option/flag. However, this change in
290295
behavior should not cause any breakages (on the contrary, it should fix
291296
scenarios where the kubelet `--cgroup-driver` setting is incorrectly
292297
configured). With old versions of the container runtimes (that don't support
293298
the new field in the CRI API) the default behavior is not changed.
294299

295-
In GA, the fallback behavior is removed and the kubelet requires the CRI
296-
runtime to implement the feature (see
300+
When the `--cgroup-driver` setting is removed, the fallback behavior is dropped
301+
and the kubelet requires the CRI runtime to implement the feature (see
297302
[Version Skew Strategy](#version-skew-strategy)).
298303

299304
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
@@ -334,14 +339,12 @@ testing of the feature gate (in addition to the unit tests) is performed.
334339

335340
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
336341

337-
Yes.
338-
339-
In alpha and beta, the CgroupDriver field of the Kubelet configuration (and the
342+
Yes, the CgroupDriver field of the Kubelet configuration (and the
340343
corresponding `--cgroup-driver` flag) will be marked as deprecated.
341344

342-
In GA, the CgroupDriver configuration option and the `--cgroup-driver` flag are
343-
deprecated, will have no effect, and will be removed in a future release as per
344-
the Kubernetes deprecation policy.
345+
After GA, the CgroupDriver configuration option and the `--cgroup-driver` flag
346+
will be removed in a future release as per the
347+
[Kubernetes deprecation policy][deprecation-policy]
345348

346349
### Monitoring Requirements
347350

@@ -354,14 +357,16 @@ info`).
354357

355358
###### How can someone using this feature know that it is working for their instance?
356359

357-
No metrics will expose this.
358-
359-
In alpha and beta, examining kubelet logs whould inform the
360+
No metrics will expose this. Examining kubelet logs whould inform
360361
that the cgroup driver setting instructed by the runtime is being used.
361362

362-
In GA, the kubelet refuses to start if the feature is not working. The can be
363-
observed in the system logs and the node being in NotReady state or node not
364-
being registered in cluster bootstrap.
363+
After GA, the CgroupDriver configuration option and the `--cgroup-driver` flag
364+
will be removed in a future release, in accordance with the
365+
[Kubernetes deprecation policy][deprecation-policy]. At that point, the kubelet
366+
will refuse to start if the required feature is not functioning correctly. This
367+
failure can be observed in system logs, with the node either entering a
368+
NotReady state or failing to register during cluster bootstrap. The behavior
369+
will be similar to other critical CRI server errors.
365370

366371
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
367372

@@ -379,12 +384,12 @@ N/A.
379384

380385
###### Does this feature depend on any specific services running in the cluster?
381386

382-
A CRI (server) implementation of the correct version.
383-
384-
In alpha and beta, the feature will fallback if the CRI implementation doesn’t
385-
support the feature.
387+
A CRI (server) implementation of the correct version. However, the feature will
388+
fallback if the CRI implementation doesn’t support the feature.
386389

387-
In GA, a sufficiently recent version of the CRI runtime is a hard requirement.
390+
After GA, the fallback behavior will be removed in a future release, as per the
391+
[Kubernetes deprecation policy][deprecation-policy]. At this point, a
392+
sufficiently recent version of the CRI runtime is a hard requirement.
388393

389394
### Scalability
390395

@@ -430,10 +435,12 @@ container runtime, only.
430435

431436
###### What are other known failure modes?
432437

433-
In alpha and beta, same that exists today: Kubelet and the CRI server (container runtime) not
438+
Same that exists today: Kubelet and the CRI server (container runtime) not
434439
agreeing on the CgroupDriver while one of them doesn’t support the feature.
435440

436-
In GA, the kubelet requires the CRI runtime to implement the feature and will
441+
After GA, the fallback behavior will be removed in a future release, as per the
442+
[Kubernetes deprecation policy][deprecation-policy]. At this point,
443+
the kubelet requires the CRI runtime to implement the feature and will
437444
refuse to start if it is not supported. As a result, the minimum required
438445
versions for containerd is v2.0 and for cri-o is v1.28.
439446

0 commit comments

Comments
 (0)