Skip to content

Commit 0528ad0

Browse files
authored
Merge pull request #5390 from wongchar/align-by-uncore-beta
KEP-4800: Promote prefer-align-cpus-by-uncorecache CPUManager feature to beta
2 parents cc7df39 + effb5cb commit 0528ad0

File tree

3 files changed

+35
-24
lines changed

3 files changed

+35
-24
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4800
22
alpha:
33
approver: "@soltysh"
4+
beta:
5+
approver: "@soltysh"

keps/sig-node/4800-cpumanager-split-uncorecache/README.md

Lines changed: 30 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
- [e2e tests](#e2e-tests)
2121
- [Graduation Criteria](#graduation-criteria)
2222
- [Alpha](#alpha)
23+
- [Beta](#beta)
2324
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
2425
- [Version Skew Strategy](#version-skew-strategy)
2526
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -261,7 +262,7 @@ The `prefer-align-cpus-by-uncorecache` feature will be enabled and tested indivi
261262
- `full-pcpus-only`
262263
- Topology Manager NUMA Affinity
263264

264-
The following CPU Topologies are representative of various uncore cache architectures and will be added to policy_test.go and represented in the unit testing.
265+
The following CPU Topologies are representative of various uncore cache architectures and will be added to [policy_test.go](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/cm/cpumanager/policy_test.go) and represented in the unit testing.
265266

266267
- 1P AMD EPYC 7702P 64C (smt-on/off) NPS=1, 16 uncore cache instances/socket
267268
- 2P AMD EPYC 7303 32C (smt-on/off) NPS=1, 4 uncore cache instances/socket
@@ -278,19 +279,25 @@ N/A. This feature requires a e2e test for testing.
278279

279280
##### e2e tests
280281

281-
- For e2e testing, checks will be added to determine if the node has a split uncore cache topology. If node does not meet the requirement to have multiple uncore caches, the added tests will be skipped.
282-
- e2e testing should cover the deployment of a pod that is following uncore cache alignment. CPU assignment can be determined by podresources API and programatically cross-referenced to syfs topology information to determine proper uncore cache alignment.
283-
- For e2e testing, guaranteed pods will be deployed with various CPU size requirements on our own baremetal instances across different vendor architectures and confirming the CPU assignments to uncore cache core groupings. This feature is intended for baremetal only and not cloud instances.
284-
- Update CI to test GCP instances of different architectures utilizing uncore cache alignment feature.
285-
282+
- [should update alignment counters when pod successfully run taking less than uncore cache group](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cpu_manager_metrics_test.go):[SIG-node](https://testgrid.k8s.io/sig-node):[SIG-node-kubelet](https://testgrid.k8s.io/sig-node-kubelet)
283+
- [should update alignment counters when pod successfully run taking a full uncore cache group](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cpu_manager_metrics_test.go):[SIG-node](https://testgrid.k8s.io/sig-node):[SIG-node-kubelet](https://testgrid.k8s.io/sig-node-kubelet)
284+
- [should not update alignment counters when pod successfully run taking more than a uncore cache group](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cpu_manager_metrics_test.go):[SIG-node](https://testgrid.k8s.io/sig-node):[SIG-node-kubelet](https://testgrid.k8s.io/sig-node-kubelet)
286285

287286
### Graduation Criteria
288287

289288
#### Alpha
290289

291290
- Feature implemented behind a feature gate flag option
292-
- E2E Tests will be skipped until nodes with uncore cache can be provisioned within CI hardware. Work is ongoing to add required systems (https://github.com/kubernetes/k8s.io/issues/7339). E2E testing will be required to graduate to beta.
293-
- Providing a metric to verify uncore cache alignment will be required to graduate to beta.
291+
- Add unit test coverage
292+
- Added metrics to cover observability needs
293+
- Added e2e tests for metrics
294+
295+
#### Beta
296+
297+
- Address bug fixes: ability to schedule odd-integer CPUs for uncore cache alignment
298+
- Add test cases to ensure functional compatibility with existing CPUManager options
299+
- Add test cases to ensure and report incompatibility with existing CPUManager options that are not supported with prefer-align-cpus-by-uncore-cache
300+
- Add E2E test coverage for feature
294301

295302
### Upgrade / Downgrade Strategy
296303

@@ -330,13 +337,12 @@ you need any help or guidance.
330337

331338
To enable this feature requires enabling the feature gates for static policy in the Kubelet configuration file for the CPUManager feature gate and add the policy option for uncore cache alignment
332339

333-
334340
###### How can this feature be enabled / disabled in a live cluster?
335341

336342
For `CPUManager` it is a requirement going from `none` to `static` policy cannot be done dynamically because of the `cpu_manager_state file`. The node needs to be drained and the policy checkpoint file (`cpu_manager_state`) need to be removed before restarting Kubelet. This feature specifically relies on the `static` policy being enabled.
337343

338344
- [x] Feature gate (also fill in values in `kep.yaml`)
339-
- Feature gate name: `CPUManagerAlphaPolicyOptions`
345+
- Feature gate name: `CPUManagerBetaPolicyOptions`
340346
- Components depending on the feature gate: `kubelet`
341347
- [x] Other
342348
- Describe the mechanism: Change the `kubelet` configuration to set a `CPUManager` policy of static then setting the policy option of `prefer-align-cpus-by-uncorecache`
@@ -360,10 +366,9 @@ Feature will be enabled. Proper drain of node and restart of kubelet required. F
360366

361367
###### Are there any tests for feature enablement/disablement?
362368

363-
Option is not enabled dynamically. To enable/disable option, cpu_manager_state must be removed and kubelet must be restarted.
364-
Unit tests will be implemented to test if the feature is enabled/disabled.
365-
E2e node serial suite can be use to test the enablement/disablement of the feature since it allows the kubelet to be restarted.
366-
369+
E2E test will demonstrate default behavior is preserved when `CPUManagerPolicyOptions` feature gate is disabled.
370+
Metric created to check uncore cache alignment after cpuset is determined and utilized in E2E tests with feature enabled.
371+
See [cpu_manager_metrics_test.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/cpu_manager_metrics_test.go)
367372

368373
### Rollout, Upgrade and Rollback Planning
369374

@@ -373,12 +378,13 @@ This section must be completed when targeting beta to a release.
373378

374379
###### How can a rollout or rollback fail? Can it impact already running workloads?
375380

376-
Kubelet restarts are not expected to impact existing CPU assignments to already running workloads
377-
381+
This feature is a best-effort alignment of CPUs to uncore caches that requires a kubelet restart that must not affect running workloads. No changes needed to cpu_manager_state file.
382+
A rollout may fail based upon existing workloads that create fragmented uncore caches on the node, potentially resulting in CPUset distribution across multiple caches based upon the CPU quantity requirements and the best-effort policy.
383+
Metrics below can help the user track alignment, but a rollback will not help because the feature is not a strict alignment to uncore caches, but a best-effort to reduce shared uncore caches.
378384

379385
###### What specific metrics should inform a rollback?
380386

381-
Increased pod startup time/latency
387+
`kubelet_container_aligned_compute_resources_count` and `container_aligned_compute_resources_failure_count` metric can be tracked to measure if there are issues in the cpuset allocation that can determine if a rollback is necessary.
382388

383389
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
384390

@@ -397,7 +403,7 @@ Reference CPUID info in podresources API to be able to verify assignment.
397403
###### How can an operator determine if the feature is in use by workloads?
398404

399405
Reference podresources API to determine CPU assignment and CacheID assignment per container.
400-
Use proposed 'container_aligned_compute_resources_count' metric which reports the count of containers getting aligned compute resources. See PR#127155 (https://github.com/kubernetes/kubernetes/pull/127155).
406+
Use 'container_aligned_compute_resources_count' metric which reports the count of containers getting aligned compute resources. See [kubelet/metrics/metrics.go](https://github.com/kubernetes/kubernetes/blob/8f1f17a04f62ab64ebe4f0b9d7f5f799bf56a0d9/pkg/kubelet/metrics/metrics.go#L135).
401407

402408
###### How can someone using this feature know that it is working for their instance?
403409

@@ -409,16 +415,17 @@ Reference podresources API to determine CPU assignment.
409415

410416
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
411417

412-
Measure the time to deploy pods under default settings and compare to the time to deploy pods with align-by-uncorecache enabled. Time difference should be negligible.
418+
In default Kubernetes installation, 99th percentile per cluster-day <= X
419+
This feature is best-effort and will not cause failed admission, but can introduce admission delay.
413420

414421
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
415422

416423
- Metrics
417-
- `topology_manager_admission_duration_ms`: Which measures the the duration of the admission process performed by Topology Manager.
424+
- `topology_manager_admission_duration_ms` can be used to determine pod admission time
418425

419426
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
420427

421-
Utilized proposed 'container_aligned_compute_resources_count' in PR#127155 to be extended for uncore cache alignment count.
428+
No.
422429

423430
<!--
424431
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
@@ -526,6 +533,8 @@ For each of them, fill in the following information by copying the below templat
526533

527534
- The outlined sections were filled out was created 2024-08-27.
528535

536+
- 2025-06-09: Submitted PR to promote feature to beta
537+
529538
## Drawbacks
530539

531540
N/A

keps/sig-node/4800-cpumanager-split-uncorecache/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ see-also:
2222
replaces: []
2323

2424
# The target maturity stage in the current dev cycle for this KEP.
25-
stage: alpha
25+
stage: beta
2626

2727
# The most recent milestone for which work toward delivery of this KEP has been
2828
# done. This can be the current (upcoming) milestone, if it is being actively
2929
# worked on.
30-
latest-milestone: "v1.33"
30+
latest-milestone: "v1.34"
3131

3232
# The milestone at which this feature was, or is targeted to be, at each stage.
3333
milestone:
@@ -38,7 +38,7 @@ milestone:
3838
# The following PRR answers are required at alpha release
3939
# List the feature gate name and the components for which it must be enabled
4040
feature-gates:
41-
- name: "CPUManagerPolicyAlphaOptions"
41+
- name: "CPUManagerPolicyBetaOptions"
4242
components:
4343
- kubelet
4444
disable-supported: true

0 commit comments

Comments
 (0)