You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/4540-strict-cpu-reservation/README.md
+24-19Lines changed: 24 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
45
45
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
46
46
-[x] (R) KEP approvers have approved the KEP status as `implementable`
47
47
-[x] (R) Design details are appropriately documented
48
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
48
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
49
49
-[x] e2e Tests for all Beta API Operations (endpoints)
50
50
-[x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
51
51
-[x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
@@ -129,7 +129,7 @@ When `strict-cpu-reservation` is enabled:
129
129
130
130
### Risks and Mitigations
131
131
132
-
The feature is isolated to a specific policy option `strict-cpu-reservation` under `cpuManagerPolicyOptions` and is protected by feature gate `CPUManagerPolicyBetaOptions` before the feature graduates to `Stable` i.e. always enabled.
132
+
The feature is isolated to a specific policy option `strict-cpu-reservation` under `cpuManagerPolicyOptions`.
133
133
134
134
Concern for feature impact on best-effort workloads, the workloads that do not have resource requests, is brought up.
135
135
@@ -288,15 +288,23 @@ No new integration tests for kubelet are planned.
288
288
289
289
##### e2e tests
290
290
291
-
- These cases will be added in the existing e2e tests:
292
-
- CPU Manager works with `strict-cpu-reservation` policy option
291
+
The e2e tests are implemented in <https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cpu_manager_test.go>, marked with Ginkgo "strict-cpu-reservation" label.
293
292
294
-
- Basic functionality
295
-
1. Enable `strict-cpu-reservation` policy option.
296
-
2. Create a simple pod of Burstable QoS type.
297
-
3. Verify the pod is not using the reserved CPU cores.
298
-
4. Delete the pod.
293
+
Feature functionality tests:
294
+
- running with strict CPU reservation: should let the container access all the online CPUs without a reserved CPUs set
295
+
- running with strict CPU reservation: should let the container access all the online CPUs minus the reserved CPUs set when enabled
296
+
- running with strict CPU reservation: should let the container access all the online non-exclusively-allocated CPUs minus the reserved CPUs set when enabled`
299
297
298
+
CPU Manager options compatibility tests:
299
+
- SMT Alignment and strict CPU reservation: should reject workload asking non-SMT-multiple of cpus
300
+
- SMT Alignment and strict CPU reservation: should admit workload asking SMT-multiple of cpus
301
+
- Strict CPU Reservation and Uncore Cache Alignment: should assign CPUs aligned to uncore caches with prefer-align-cpus-by-uncore-cache and avoid reserved cpus
302
+
303
+
Testgrid:
304
+
-[kubelet-serial-gce-e2e-cpu-manager](https://testgrid.k8s.io/sig-node-kubelet#kubelet-serial-gce-e2e-cpu-manager): Green
305
+
-[kubelet-gce-e2e-arm64-ubuntu-serial](https://testgrid.k8s.io/sig-node-kubelet#kubelet-gce-e2e-arm64-ubuntu-serial): Green
306
+
-[pull-e2e-serial-ec2](https://testgrid.k8s.io/sig-node-containerd#pull-e2e-serial-ec2): Green
307
+
-[node-kubelet-containerd-resource-managers](https://testgrid.k8s.io/sig-node-containerd#node-kubelet-containerd-resource-managers): Green
300
308
301
309
### Graduation Criteria
302
310
@@ -313,8 +321,8 @@ No new integration tests for kubelet are planned.
313
321
314
322
#### GA
315
323
316
-
-[] Allow time for feedback (1 year).
317
-
-[] Make sure all risks have been addressed.
324
+
-[X] Allow time for feedback (two releases).
325
+
-[X] Make sure all risks have been addressed.
318
326
319
327
### Upgrade / Downgrade Strategy
320
328
@@ -332,9 +340,6 @@ The `/var/lib/kubelet/cpu_manager_state` needs to be removed when enabling or di
332
340
333
341
###### How can this feature be enabled / disabled in a live cluster?
334
342
335
-
-[X] Feature gate (also fill in values in `kep.yaml`)
- Components depending on the feature gate: `kubelet`
338
343
-[X] Change the kubelet configuration to set a `CPUManager` policy of `static` and a `CPUManager` policy option of `strict-cpu-reservation`
339
344
- Will enabling / disabling the feature require downtime of the control plane? No
340
345
- Will enabling / disabling the feature require downtime or reprovisioning of a node? No -- removing `/var/lib/kubelet/cpu_manager_state` and restarting kubelet are enough.
@@ -346,13 +351,13 @@ Yes. Reserved CPU cores will be strictly used for system daemons and interrupt p
346
351
347
352
The feature is only enabled when all following conditions are met:
348
353
1. The `static``CPUManager` policy is selected
349
-
2. The `CPUManagerPolicyBetaOptions` feature gate is enabled and the `strict-cpu-reservation` policy option is selected
354
+
2. The `strict-cpu-reservation` policy option is selected
350
355
3. The `reservedSystemCPUs` is not empty
351
356
352
357
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
353
358
354
-
Yes, the feature can be disabled by:
355
-
1.Disable feature gate `CPUManagerPolicyBetaOptions` or remove`strict-cpu-reservation` from the list of `CPUManager` policy options
359
+
Yes, the feature can be disabled by the following steps:
360
+
1.Remove`strict-cpu-reservation` from the list of `CPUManager` policy options
356
361
2. Remove `/var/lib/kubelet/cpu_manager_state` and restart kubelet
357
362
358
363
###### What happens if we reenable the feature if it was previously rolled back?
@@ -361,7 +366,7 @@ The feature will be enabled regardless it is enabled for the first time or not.
361
366
362
367
###### Are there any tests for feature enablement/disablement?
363
368
364
-
- A specific e2e test will demonstrate that the default behaviour is preserved when the feature gate is disabled, or when the feature is not used (2 separate tests)
369
+
- A specific e2e test will demonstrate that the default behaviour is preserved when the feature is not used (2 separate tests)
365
370
366
371
### Rollout, Upgrade and Rollback Planning
367
372
@@ -561,7 +566,7 @@ You can safely disable the feature.
0 commit comments