You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We expect no non-infra related flakes in the last month as a GA graduation criteria.
369
367
-->
370
368
371
-
No new e2e tests for kubelet are planned.
369
+
These cases will be added in the existing `e2e_node` tests:
370
+
- CPU Manager works with `spread-physical-cpus-preferred` static policy option
371
+
372
+
- Basic functionality
373
+
1. Enable `CPUManagerPolicyAlphaOptions` and configure CPUManager policy option to `spread-physical-cpus-preferred`.
374
+
2. Verify the machine has more than one physical cores.
375
+
3. Create a simple pod with a container that requires 2 cpus.
376
+
4. Verify that the container cpu allocation are across physical cores.
377
+
6. Delete the pod.
372
378
373
379
### Graduation Criteria
374
380
@@ -591,7 +597,7 @@ Recall that end users cannot usually observe component logs or access metrics.
591
597
- Condition name:
592
598
- Other field:
593
599
-[x] Other (treat as last resort)
594
-
- Details: Provide logical cpu allocation distribution across physical cores and also the cpu cache metrics from ecosystem.
600
+
- Details: Inspect the kubelet configuration of the nodes: check feature gate and usage of the new option.
595
601
596
602
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
597
603
@@ -774,9 +780,12 @@ details). For now, we leave it here.
774
780
N/A
775
781
776
782
###### What are other known failure modes?
783
+
777
784
The failure modes is similar to existing options. It changes the way how cpu manager allocate CPUs.
778
785
It's compatible when user switch between options, however, when the pod get rescheduled, it will follow the current static option instead of previous one.
779
786
787
+
Currently, in alpha version, we will think it's incompatile with other options. User should stick to this option. Compatibility issue would be resolved in future version.
788
+
780
789
When user switch to non static mode, then `/var/lib/kubelet/cpu_manager_state` requires deletion. This is a known compatibility issue.
781
790
782
791
###### What steps should be taken if SLOs are not being met to determine the problem?
@@ -796,13 +805,7 @@ Major milestones might include:
796
805
797
806
## Drawbacks
798
807
799
-
Let's talk about the limitation of current policies.
800
-
801
-
1. In a cluster with sparse workloads, we try to leverage as much cpu cache as we can. `full-pcpus-only` will always allocate full phsical cores and it introduces cache competition between vcpus.
802
-
803
-
2.`distribute-cpus-across-num` will evenly distribut CPU across NUMA nodes. In some cases, we want the application to be allocated in single NUMA node if possible, which gives better performance.
804
-
805
-
Existing solutions can not address all the special needs from high peformance applications, that's why a new option is needed.
808
+
This allocation strategy tries to avoid workload taking entire physical core and it is not suitable for all workloads. For example, if the workload is CPU intensive and it's not sensitive to CPU Cache, it's not suitable to use this policy. Otherwise, the application may suffer from performance regression.
0 commit comments