Skip to content

Commit 9072dfe

Browse files
committed
Address review feedback
1 parent 9818873 commit 9072dfe

File tree

1 file changed

+14
-11
lines changed
  • keps/sig-node/4176-cpumanager-spread-cpus-preferred-policy

1 file changed

+14
-11
lines changed

keps/sig-node/4176-cpumanager-spread-cpus-preferred-policy/README.md

Lines changed: 14 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -335,8 +335,6 @@ extending the production code to implement this enhancement.
335335

336336
- `k8s.io/kubernetes/pkg/kubelet/cm/cpumanager`: `20231005` - `86.3%`
337337

338-
new added codes would be a cpu allocation policy option. We can follow how other options are tested and add enough unit tests.
339-
340338
##### Integration tests
341339

342340
<!--
@@ -368,7 +366,15 @@ https://storage.googleapis.com/k8s-triage/index.html
368366
We expect no non-infra related flakes in the last month as a GA graduation criteria.
369367
-->
370368

371-
No new e2e tests for kubelet are planned.
369+
These cases will be added in the existing `e2e_node` tests:
370+
- CPU Manager works with `spread-physical-cpus-preferred` static policy option
371+
372+
- Basic functionality
373+
1. Enable `CPUManagerPolicyAlphaOptions` and configure CPUManager policy option to `spread-physical-cpus-preferred`.
374+
2. Verify the machine has more than one physical cores.
375+
3. Create a simple pod with a container that requires 2 cpus.
376+
4. Verify that the container cpu allocation are across physical cores.
377+
6. Delete the pod.
372378

373379
### Graduation Criteria
374380

@@ -591,7 +597,7 @@ Recall that end users cannot usually observe component logs or access metrics.
591597
- Condition name:
592598
- Other field:
593599
- [x] Other (treat as last resort)
594-
- Details: Provide logical cpu allocation distribution across physical cores and also the cpu cache metrics from ecosystem.
600+
- Details: Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option.
595601

596602
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
597603

@@ -774,9 +780,12 @@ details). For now, we leave it here.
774780
N/A
775781

776782
###### What are other known failure modes?
783+
777784
The failure modes is similar to existing options. It changes the way how cpu manager allocate CPUs.
778785
It's compatible when user switch between options, however, when the pod get rescheduled, it will follow the current static option instead of previous one.
779786

787+
Currently, in alpha version, we will think it's incompatile with other options. User should stick to this option. Compatibility issue would be resolved in future version.
788+
780789
When user switch to non static mode, then `/var/lib/kubelet/cpu_manager_state` requires deletion. This is a known compatibility issue.
781790

782791
###### What steps should be taken if SLOs are not being met to determine the problem?
@@ -796,13 +805,7 @@ Major milestones might include:
796805

797806
## Drawbacks
798807

799-
Let's talk about the limitation of current policies.
800-
801-
1. In a cluster with sparse workloads, we try to leverage as much cpu cache as we can. `full-pcpus-only` will always allocate full phsical cores and it introduces cache competition between vcpus.
802-
803-
2. `distribute-cpus-across-num` will evenly distribut CPU across NUMA nodes. In some cases, we want the application to be allocated in single NUMA node if possible, which gives better performance.
804-
805-
Existing solutions can not address all the special needs from high peformance applications, that's why a new option is needed.
808+
This allocation strategy tries to avoid workload taking entire physical core and it is not suitable for all workloads. For example, if the workload is CPU intensive and it's not sensitive to CPU Cache, it's not suitable to use this policy. Otherwise, the application may suffer from performance regression.
806809

807810
## Alternatives
808811

0 commit comments

Comments
 (0)