Skip to content

Commit 2388578

Browse files
authored
Merge pull request kubernetes#5128 from nokia/4540-strict-cpu-reservation-beta
KEP-4540: Move to beta
2 parents 4d017a4 + 3b79cc4 commit 2388578

File tree

3 files changed

+87
-45
lines changed

3 files changed

+87
-45
lines changed
Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
kep-number: 4540
22
alpha:
33
approver: "@soltysh"
4-
4+
beta:
5+
approver: "@soltysh"

keps/sig-node/4540-strict-cpu-reservation/README.md

Lines changed: 79 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -42,20 +42,20 @@
4242

4343
Items marked with (R) are required *prior to targeting to a milestone / release*.
4444

45-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
45+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
4646
- [x] (R) KEP approvers have approved the KEP status as `implementable`
4747
- [x] (R) Design details are appropriately documented
4848
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49-
- [ ] e2e Tests for all Beta API Operations (endpoints)
50-
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51-
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
49+
- [x] e2e Tests for all Beta API Operations (endpoints)
50+
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51+
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
5252
- [ ] (R) Graduation criteria is in place
5353
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
54-
- [ ] (R) Production readiness review completed
54+
- [x] (R) Production readiness review completed
5555
- [ ] (R) Production readiness review approved
5656
- [x] "Implementation History" section is up-to-date for milestone
57-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
58-
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
57+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
58+
- [x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
5959

6060
[kubernetes.io]: https://kubernetes.io/
6161
[kubernetes/enhancements]: https://git.k8s.io/enhancements
@@ -73,15 +73,14 @@ With this KEP, a new `CPUManager` policy option `strict-cpu-reservation` is intr
7373

7474
The static policy is used to reduce latency or improve performance. If you want to move system daemons or interrupt processing to dedicated cores, the obvious way is use the `reservedSystemCPUs` option. But in current implementation this isolation is implemented only for guaranteed pods with integer CPU requests not for burstable and best-effort pods (and guaranteed pods with fractional CPU requests).
7575
Admission is only comparing the cpu requests against the allocatable cpus. Since the cpu limit are higher than the request, it allows burstable and best-effort pods to use up the capacity of `reservedSystemCPUs` and cause host OS services to starve in real life deployments.
76-
Custom CPU allocation policies deployed as NRI plugins (e.g. Balloons) can separate infrastructure and workload into different CPU pools but they require extra software, additional tuning and reduced CPU pool size could affect performance of multi-threaded processes.
7776

7877
### Goals
7978
* Align scheduler and node view for Node Allocatable (total - reserved).
8079
* Ensure `reservedSystemCPUs` is only used by system daemons or interrupt processing not by workloads.
8180
* Ensure no breaking changes for the `static` policy of `CPUManager`.
8281

8382
### Non-Goals
84-
* Change scheduler interface to sub-partition `cpu` resource (as described in the archived Risk Mitigation Option 1).
83+
* Change interface between node and scheduler.
8584

8685
## Proposal
8786

@@ -109,10 +108,6 @@ With the following Kubelet configuration:
109108
```yaml
110109
kind: KubeletConfiguration
111110
apiVersion: kubelet.config.k8s.io/v1beta1
112-
featureGates:
113-
...
114-
CPUManagerPolicyOptions: true
115-
CPUManagerPolicyAlphaOptions: true
116111
cpuManagerPolicy: static
117112
cpuManagerPolicyOptions:
118113
strict-cpu-reservation: "true"
@@ -123,7 +118,7 @@ reservedSystemCPUs: "0,32,1,33,16,48"
123118
When `strict-cpu-reservation` is disabled:
124119
```console
125120
# cat /var/lib/kubelet/cpu_manager_state
126-
{"policyName":"static","defaultCpuSet":"0-79","checksum":1241370203}
121+
{"policyName":"static","defaultCpuSet":"0-63","checksum":1058907510}
127122
```
128123

129124
When `strict-cpu-reservation` is enabled:
@@ -134,7 +129,7 @@ When `strict-cpu-reservation` is enabled:
134129

135130
### Risks and Mitigations
136131

137-
The feature is isolated to a specific policy option `strict-cpu-reservation` under `cpuManagerPolicyOptions` and is protected by feature gate `CPUManagerPolicyAlphaOptions` or `CPUManagerPolicyBetaOptions` before the feature graduates to `Stable` i.e. enabled by default.
132+
The feature is isolated to a specific policy option `strict-cpu-reservation` under `cpuManagerPolicyOptions` and is protected by feature gate `CPUManagerPolicyBetaOptions` before the feature graduates to `Stable` i.e. always enabled.
138133

139134
Concern for feature impact on best-effort workloads, the workloads that do not have resource requests, is brought up.
140135

@@ -144,11 +139,11 @@ The concern is, when the feature graduates to `Stable`, it will be enabled by de
144139

145140
However, this is exactly the feature intent, best-effort workloads have no KPI requirement, they are meant to consume whatever CPU resources left on the node including starving from time to time. Best-effort workloads are not scheduled to run on the `reservedSystemCPUs` so they shall not be run on the `reservedSystemCPUs` to destablize the whole node.
146141

147-
Nevertheless, risk mitigation has been discussed in details (see archived options below) and we agree to start with the following node metrics of cpu pool sizes in Alpha stage to assess the actual impact in real deployment before revisiting if we need risk mitigation.
142+
Nevertheless, risk mitigation has been discussed in details (see archived options below) and we agree to start with the following node metrics of cpu pool sizes in Alpha and Beta stages to assess the actual impact in real deployment. The plan is to move the current implementation to Stable stage if no field issue is observed for one year.
148143

149144
https://github.com/kubernetes/kubernetes/pull/127506
150-
- `cpu\_manager\_shared\_pool\_size\_millicores`: report shared pool size, in millicores (e.g. 13500m), expected to be non-zone otherwise best-effort pods will starve
151-
- `cpu\_manager\_exclusive\_cpu\_allocation\_count`: report exclusively allocated cores, counting full cores (e.g. 16)
145+
- `cpu_manager_shared_pool_size_millicores`: report shared pool size, in millicores (e.g. 13500m), expected to be non-zone otherwise best-effort pods will starve
146+
- `cpu_manager_exclusive_cpu_allocation_count`: report exclusively allocated cores, counting full cores (e.g. 16)
152147

153148

154149
#### Archived Risk Mitigation (Option 1)
@@ -184,7 +179,6 @@ kind: KubeletConfiguration
184179
apiVersion: kubelet.config.k8s.io/v1beta1
185180
featureGates:
186181
...
187-
CPUManagerPolicyOptions: true
188182
CPUManagerPolicyAlphaOptions: true
189183
cpuManagerPolicy: static
190184
cpuManagerPolicyOptions:
@@ -298,7 +292,7 @@ No new integration tests for kubelet are planned.
298292
- CPU Manager works with `strict-cpu-reservation` policy option
299293

300294
- Basic functionality
301-
1. Enable `CPUManagerPolicyAlphaOptions` feature gate and `strict-cpu-reservation` policy option.
295+
1. Enable `strict-cpu-reservation` policy option.
302296
2. Create a simple pod of Burstable QoS type.
303297
3. Verify the pod is not using the reserved CPU cores.
304298
4. Delete the pod.
@@ -313,8 +307,9 @@ No new integration tests for kubelet are planned.
313307

314308
#### Beta
315309

316-
- [ ] Gather feedback from consumers of the new policy option.
317-
- [ ] Verify no major bugs reported in the previous cycle.
310+
- [X] Gather feedback from consumers of the new policy option.
311+
- [X] Verify no major bugs reported in the previous cycle.
312+
- [X] Ensure proper e2e tests are in place.
318313

319314
#### GA
320315

@@ -333,33 +328,32 @@ No changes needed.
333328

334329
### Feature Enablement and Rollback
335330

336-
The `/var/lib/kubelet/cpu\_manager\_state` needs to be removed when enabling or disabling the feature.
331+
The `/var/lib/kubelet/cpu_manager_state` needs to be removed when enabling or disabling the feature.
337332

338333
###### How can this feature be enabled / disabled in a live cluster?
339334

340335
- [X] Feature gate (also fill in values in `kep.yaml`)
341-
- Feature gate name: `CPUManagerPolicyAlphaOptions`
336+
- Feature gate name: `CPUManagerPolicyBetaOptions`
342337
- Components depending on the feature gate: `kubelet`
343338
- [X] Change the kubelet configuration to set a `CPUManager` policy of `static` and a `CPUManager` policy option of `strict-cpu-reservation`
344339
- Will enabling / disabling the feature require downtime of the control plane? No
345-
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No -- removing `/var/lib/kubelet/cpu\_manager\_state` and restarting kubelet are enough.
340+
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No -- removing `/var/lib/kubelet/cpu_manager_state` and restarting kubelet are enough.
346341

347342

348343
###### Does enabling the feature change any default behavior?
349344

350345
Yes. Reserved CPU cores will be strictly used for system daemons and interrupt processing no longer available for workloads.
351346

352347
The feature is only enabled when all following conditions are met:
353-
1. The `CPUManagerPolicyAlphaOptions` feature gate must be enabled
354-
2. The `static` `CPUManager` policy must be selected
355-
3. The new `strict-cpu-reservation` policy option must be selected
356-
4. The `reservedSystemCPUs` is not empty
348+
1. The `static` `CPUManager` policy is selected
349+
2. The `CPUManagerPolicyBetaOptions` feature gate is enabled and the `strict-cpu-reservation` policy option is selected
350+
3. The `reservedSystemCPUs` is not empty
357351

358352
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
359353

360-
Yes, the feature can be disabled by either:
361-
1. Disable feature gate `CPUManagerPolicyAlphaOptions` or remove `strict-cpu-reservation` from the list of `CPUManager` policy options
362-
2. Remove `/var/lib/kubelet/cpu\_manager\_state` and restart kubelet
354+
Yes, the feature can be disabled by:
355+
1. Disable feature gate `CPUManagerPolicyBetaOptions` or remove `strict-cpu-reservation` from the list of `CPUManager` policy options
356+
2. Remove `/var/lib/kubelet/cpu_manager_state` and restart kubelet
363357

364358
###### What happens if we reenable the feature if it was previously rolled back?
365359

@@ -381,7 +375,7 @@ If the feature rollout fails, burstable and best-efforts continue to run on the
381375
If the feature rollback fails, burstable and best-efforts continue not to run on the reserved CPU cores.
382376
In either case, existing workload will not be affected.
383377

384-
When enabling or disabling the feature, make sure `/var/lib/kubelet/cpu\_manager\_state` is removed before restarting kubelet otherwise kubelet restart could fail.
378+
When enabling or disabling the feature, make sure `/var/lib/kubelet/cpu_manager_state` is removed before restarting kubelet otherwise kubelet restart could fail.
385379

386380
<!--
387381
Try to be as paranoid as possible - e.g., what if some components will restart
@@ -410,8 +404,54 @@ Describe manual testing that was done and the outcomes.
410404
Longer term, we may want to require automated upgrade/rollback tests, but we
411405
are missing a bunch of machinery and tooling and can't do that now.
412406
-->
407+
If you have this feature enabled in v1.32 under `CPUManagerPolicyAlphaOptions` (default to false) you will continue to have the feature enabled in v1.33 under `CPUManagerPolicyBetaOptions` (default to true) automatically i.e. no extra action is needed.
408+
To enable or disable this feature in v1.33, follow the feature activation and de-activation procedures described above.
409+
410+
Manual upgrade->downgrade->upgrade testing from v1.32 to v1.33 is as follows:
411+
412+
With the following Kubelet configuration and `cpu_manager_state` v1.32:
413+
414+
```yaml
415+
kind: KubeletConfiguration
416+
apiVersion: kubelet.config.k8s.io/v1beta1
417+
featureGates:
418+
CPUManagerPolicyAlphaOptions: true
419+
...
420+
cpuManagerPolicy: static
421+
cpuManagerPolicyOptions:
422+
strict-cpu-reservation: "true"
423+
reservedSystemCPUs: "0,32,1,33,16,48"
424+
...
425+
```
413426

414-
We manually test it in our internal environment and it works.
427+
```console
428+
# cat /var/lib/kubelet/cpu_manager_state
429+
{"policyName":"static","defaultCpuSet":"2-15,17-31,34-47,49-63","checksum":4141502832}
430+
```
431+
432+
The same Kubelet `cpu_manager_state` will be seen after upgrading to v1.33:
433+
```console
434+
# cat /var/lib/kubelet/cpu_manager_state
435+
{"policyName":"static","defaultCpuSet":"2-15,17-31,34-47,49-63","checksum":4141502832}
436+
```
437+
438+
You are recommended to remove the `CPUManagerPolicyAlphaOptions` feature gate after upgrading to v1.33 for operational integrity, but it is not mandatory.
439+
440+
If you want to disable the feature in v1.33, you can either disable the `CPUManagerPolicyBetaOptions` feature gate, or remove the `strict-cpu-reservation` policy option. Remember to remove the `/var/lib/kubelet/cpu_manager_state` file before restarting kubelet.
441+
442+
The following `cpu_manager_state` will be seen after the feature is disabled:
443+
```console
444+
# cat /var/lib/kubelet/cpu_manager_state
445+
{"policyName":"static","defaultCpuSet":"0-63","checksum":1058907510}
446+
```
447+
448+
If you want to enable the feature in v1.33, you need to make sure the `CPUManagerPolicyBetaOptions` feature gate is not disabled and add the `strict-cpu-reservation` policy option. Remember to remove the `/var/lib/kubelet/cpu_manager_state` file before restarting kubelet.
449+
450+
The following `cpu_manager_state` will be seen after the feature is enabled:
451+
```console
452+
# cat /var/lib/kubelet/cpu_manager_state
453+
{"policyName":"static","defaultCpuSet":"2-15,17-31,34-47,49-63","checksum":4141502832}
454+
```
415455

416456
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
417457

@@ -425,7 +465,7 @@ No.
425465

426466
###### How can an operator determine if the feature is in use by workloads?
427467

428-
Inspect the `defaultCpuSet` in `/var/lib/kubelet/cpu\_manager\_state`:
468+
Inspect the `defaultCpuSet` in `/var/lib/kubelet/cpu_manager_state`:
429469
- When the feature is disabled, the reserved CPU cores are included in the `defaultCpuSet`.
430470
- When the feature is enabled, the reserved CPU cores are not included in the `defaultCpuSet`.
431471

@@ -447,9 +487,9 @@ This feature allows users to protect infrastructure services from bursty workloa
447487

448488
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
449489

450-
https://github.com/kubernetes/kubernetes/pull/127506:
451-
- `cpu\_manager\_shared\_pool\_size\_millicores`: report shared pool size, in millicores (e.g. 13500m), expected to be non-zone otherwise best-effort pods will starve
452-
- `cpu\_manager\_exclusive\_cpu\_allocation\_count`: report exclusively allocated cores, counting full cores (e.g. 16)
490+
Monitor the following kubelet counters:
491+
- `cpu_manager_shared_pool_size_millicores`: report shared pool size, in millicores (e.g. 13500m), expected to be non-zone otherwise best-effort pods will starve
492+
- `cpu_manager_exclusive_cpu_allocation_count`: report exclusively allocated cores, counting full cores (e.g. 16)
453493

454494
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
455495

@@ -520,6 +560,7 @@ You can safely disable the feature.
520560

521561
- 2024-03-08: Initial KEP created
522562
- 2024-10-07: KEP gets LGTM and Approval
563+
- 2025-02-03: KEP updated with Beta criteria
523564

524565

525566
## Drawbacks

keps/sig-node/4540-strict-cpu-reservation/kep.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,19 @@ status: implementable
88
creation-date: 2024-03-06
99
reviewers:
1010
- "@ffromani"
11-
- "@klueska"
1211
- "@swatisehgal"
1312
approvers:
14-
- "@sig-node-tech-leads"
13+
- "@ffromani"
1514
see-also: []
1615
replaces: []
1716

1817
# The target maturity stage in the current dev cycle for this KEP.
19-
stage: alpha
18+
stage: beta
2019

2120
# The most recent milestone for which work toward delivery of this KEP has been
2221
# done. This can be the current (upcoming) milestone, if it is being actively
2322
# worked on.
24-
latest-milestone: "v1.32"
23+
latest-milestone: "v1.33"
2524

2625
# The milestone at which this feature was, or is targeted to be, at each stage.
2726
milestone:
@@ -32,9 +31,10 @@ milestone:
3231
# The following PRR answers are required at alpha release
3332
# List the feature gate name and the components for which it must be enabled
3433
feature-gates:
35-
- name: "CPUManagerPolicyAlphaOptions"
36-
components:
34+
- name: "CPUManagerPolicyBetaOptions"
35+
components:
3736
- kubelet
37+
3838
disable-supported: true
3939

4040
# The following PRR answers are required at beta release

0 commit comments

Comments
 (0)