Skip to content

Commit b960476

Browse files
marquizhaircommander
authored andcommitted
KEP-3044: address review feedback
- changed CRI API to have a new RPC for querying for runtime config (instead of re-using runtime Status) - update kep.yaml: add reviewers and approvers - added feature gate (enabled by default) - changed target matutiry to beta - add back the deprecation warning about cgroupDriver kubelet config setting - small updates to PRR
1 parent 41633a8 commit b960476

File tree

3 files changed

+60
-36
lines changed

3 files changed

+60
-36
lines changed

keps/prod-readiness/sig-node/4033.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
# "prod-readiness-approvers" group
33
# of http://git.k8s.io/enhancements/OWNERS_ALIASES
44
kep-number: 4033
5-
alpha:
5+
beta:
66
approver: "@johnbelamaric"

keps/sig-node/4033-group-driver-detection-over-cri/README.md

Lines changed: 50 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ tags, and then generate with `hack/update-toc.sh`.
9797
- [Integration tests](#integration-tests)
9898
- [e2e tests](#e2e-tests)
9999
- [Graduation Criteria](#graduation-criteria)
100+
- [Beta](#beta)
101+
- [GA](#ga)
100102
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
101103
- [Version Skew Strategy](#version-skew-strategy)
102104
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -223,39 +225,36 @@ adoption.
223225

224226
### CRI API
225227

226-
Extend the CRI protocol to inform the kubelet which cgroup driver should be
227-
used.
228+
Extend the CRI runtime API to inform the kubelet which cgroup driver should be
229+
used. A new RuntimeConfig rpc is added to query the information.
228230

229231
```diff
230-
message RuntimeStatus {
231-
// List of current observed runtime conditions.
232-
repeated RuntimeCondition conditions = 1;
233-
+ // Configuration settings of the runtime. This field contains global
234-
+ // runtime configuration options that are not specific to runtime handlers.
235-
+ RuntimeConfiguration configuration = 2;
232+
// Runtime service defines the public APIs for remote container runtimes
233+
service RuntimeService {
234+
...
235+
+ // RuntimeConfig returns configuration information of the runtime.
236+
+ rpc RuntimeConfig(RuntimeConfigRequest) returns (RuntimeConfigResponse) {}
236237
}
237238

238-
+message RuntimeConfiguration {
239-
+ // Configuration information for Linux-based runtimes
239+
+message RuntimeConfigRequestRequest {}
240+
241+
+message RuntimeConfigResponse {
242+
+ // Configuration information for Linux-based runtimes. This field contains global
243+
+ // runtime configuration options that are not specific to runtime handlers.
240244
+ LinuxRuntimeConfiguration linux = 1;
241245
+}
242246

243247
+message LinuxRuntimeConfiguration {
244248
+ // Cgroup driver to use
245249
+ CgroupDriver cgroup_driver = 1;
246250
+}
247-
+
251+
248252
+enum CgroupDriver {
249253
+ CGGROUPFS = 0;
250254
+ SYSTEMD = 1;
251255
+}
252256
```
253257

254-
The existing RuntimeStatus message (of the existing Status API endpoint) is
255-
used as this is being frequently queried by the kubelet, and is a place where
256-
the runtime tells the Kubelet about its state. The runtime will decide which
257-
CgroupDriver to choose based on existing methods: its own configuration.
258-
259258
### Kubelet
260259

261260
Kubelet will be modified to support the new field.
@@ -265,10 +264,17 @@ will take precedence over cgroupDriver setting from the kubelet config (or
265264
`--cgroup-driver` command line flag). If the runtime does not provide
266265
information about the cgroup driver, then kubelet will fall back to using its
267266
own configuration (`cgroupDriver` from kubeletConfig or the `--cgroup-driver`
268-
flag).
267+
flag). Further, the kubeletConfig field and `--cgroup-driver` flag will be
268+
marked as deprecated, to be dropped when support for the feature is adopted by
269+
CRI-O and containerd. Usage of the deprecated setting will produce a log
270+
message, e.g.:
271+
272+
```
273+
cgroupDriver option has been deprecated and will be dropped in a future release. Please upgrade to a CRI implementation that supports cgroup-driver detection.
274+
```
269275

270276
Kubelet startup is modified so that connection to the CRI server (container
271-
runtime) is established and RuntimeStatus is queried before initializing the
277+
runtime) is established and RuntimeConfig is queried before initializing the
272278
kubelet internal container-manager which is responsible for kubelet-side cgroup
273279
management.
274280

@@ -416,8 +422,17 @@ in back-to-back releases.
416422
- Deprecate the flag
417423
-->
418424

419-
All CRI implementations support the new cgroupDriver field, and the Kubelet
420-
drops support for its own CgroupManager field/flag.
425+
The feature is targeting directly to beta, with the feature gate enabled by
426+
default.
427+
428+
#### Beta
429+
430+
- [ ] Feature implemented, with the feature gate enabled by default.
431+
432+
#### GA
433+
434+
- [ ] released versions of CRI-O and containerd runtime implementations support the feature
435+
- [ ] No bugs reported in the previous cycle.
421436

422437
### Upgrade / Downgrade Strategy
423438

@@ -453,7 +468,7 @@ the new field in the CRI API, they just resort to the existing behavior of
453468
respecting their individual cgroup-driver setting. That is, if the node has a
454469
container runtime that does not support this field the kubelet will use its
455470
cgroupDriver setting from kubeletConfig (or `--cgroup-driver` commandline
456-
flag). This is also the case if the kubelet does not support the new field:
471+
flag). This is also the case if the kubelet does not support the new field:
457472
the information about cgroup driver advertised by the runtime will be just
458473
ignored by kubelet and it will resort to its own configuration settings. Note:
459474
this does present a configuration skew risk, but that risk is the same as
@@ -501,8 +516,9 @@ well as the [existing list] of feature gates.
501516
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
502517
-->
503518

504-
No feature gate required–the fields are all SIG Node internal and have simple
505-
fallbacks.
519+
- [X] Feature gate (also fill in values in `kep.yaml`)
520+
- Feature gate name: KubeletCgroupDriverFromCRI
521+
- Components depending on the feature gate: kubelet
506522

507523
###### Does enabling the feature change any default behavior?
508524

@@ -513,7 +529,10 @@ automations, so be extremely careful here.
513529

514530
Yes. If/when the runtime is updated to a version that supports this, kubelet
515531
will ignore the cgroupDriver config option/flag. However, this change in
516-
behavior should be largely invisible/irrelevant to the user.
532+
behavior should not cause any breakages (on the contrary, it should fix
533+
scenarios where the kubelet `--cgorup-driver` setting is incorrectly
534+
configured). With old versions of the container runtimes (that don't support
535+
the new field in the CRI API) the default behavior is not changed.
517536

518537
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
519538

@@ -528,11 +547,13 @@ feature.
528547
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
529548
-->
530549

531-
No, though the scope is so small roll back should not be required.
550+
Yes, through the feature gate.
532551

533552
###### What happens if we reenable the feature if it was previously rolled back?
534553

535-
N/A.
554+
Kubelet starts to use the cgroup driver instructed by the runtime. Potentially
555+
fixing a broken/misbehaving node if the kubelet cgroupDriver option (or
556+
`--cgroup-driver` flag) was incorrectly set.
536557

537558
###### Are there any tests for feature enablement/disablement?
538559

@@ -549,7 +570,7 @@ You can take a look at one potential example of such test in:
549570
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
550571
-->
551572

552-
N/A.
573+
TBD.
553574

554575
### Rollout, Upgrade and Rollback Planning
555576

@@ -637,7 +658,8 @@ and operation of this feature.
637658
Recall that end users cannot usually observe component logs or access metrics.
638659
-->
639660

640-
No metrics likely will expose this.
661+
No metrics likely will expose this. Examining kubelet logs whould inform the
662+
that the cgroup driver setting instructed by the runtime is being used.
641663

642664
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
643665

keps/sig-node/4033-group-driver-detection-over-cri/kep.yaml

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,15 @@ participating-sigs:
99
status: implementable
1010
creation-date: 2023-05-25
1111
reviewers:
12-
- TBD
12+
- "@mrunalp"
1313
approvers:
14-
- TBD
14+
- "@sig-node-leads"
1515

1616
see-also: []
1717
replaces: []
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
20+
stage: beta
2121

2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
@@ -26,14 +26,16 @@ latest-milestone: "v1.28"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
29-
alpha: "v1.28"
30-
beta: "v1.xx"
29+
beta: "v1.28"
3130
stable: "v1.yy"
3231

3332
# The following PRR answers are required at alpha release
3433
# List the feature gate name and the components for which it must be enabled
35-
feature-gates: []
36-
disable-supported: false
34+
feature-gates:
35+
- name: KubeletCgroupDriverFromCRI
36+
components:
37+
- kubelet
38+
disable-supported: true
3739

3840
# The following PRR answers are required at beta release
3941
metrics: []

0 commit comments

Comments
 (0)