Skip to content

Commit 8d095f4

Browse files
committed
node: KEP-2902: Address review comments
Signed-off-by: Swati Sehgal <[email protected]>
1 parent 0dea904 commit 8d095f4

File tree

1 file changed

+37
-3
lines changed
  • keps/sig-node/2902-cpumanager-distribute-cpus-policy-option

1 file changed

+37
-3
lines changed

keps/sig-node/2902-cpumanager-distribute-cpus-policy-option/README.md

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
- [Proposal](#proposal)
1010
- [Risks and Mitigations](#risks-and-mitigations)
1111
- [Design Details](#design-details)
12+
- [Compatibility with <code>full-pcpus-only</code> policy options](#compatibility-with-full-pcpus-only-policy-options)
1213
- [Test Plan](#test-plan)
1314
- [Prerequisite testing updates](#prerequisite-testing-updates)
1415
- [Unit tests](#unit-tests)
@@ -119,6 +120,12 @@ If none of the above conditions can be met, resort back to a best-effort fit of
119120

120121
NOTE: The striping operation after all CPUs have been evenly distributed will be performed such that the overall disribution of CPUs across those NUMA nodes remains as balanced as possible.
121122

123+
### Compatibility with `full-pcpus-only` policy options
124+
125+
| Compatibility | alpha | beta | GA |
126+
| --- | --- | --- | --- |
127+
| full-pcpus-only | x | x | x |
128+
122129
### Test Plan
123130

124131
We will extend both the unit test suite and the E2E test suite to cover the new policy option described in this KEP.
@@ -135,7 +142,7 @@ to implement this enhancement.
135142

136143
##### Integration tests
137144

138-
Not Applicable as Kubelet features don't have integration tests.
145+
Not Applicable as Kubelet features don't have integration tests. We use a mix of `e2e_node` and `e2e` tests.
139146

140147
##### e2e tests
141148

@@ -144,6 +151,7 @@ Currently no e2e tests are present for this particular policy option. E2E tests
144151
The plan is to add e2e tests to cover the basic flows for cases below:
145152
1. `distribute-cpus-across-numa` option is enabled: The test will ensure that the allocated CPUs are distributed across NUMA nodes according to the policy.
146153
1. `distribute-cpus-across-numa` option is disabled: The test will verify that the allocated CPUs are packed according to the default behavior.
154+
1. Test how this option interacts with `full-pcpus-only` policy option (and test for it enabled and disabled).
147155

148156
### Graduation Criteria
149157

@@ -254,7 +262,30 @@ To verify the list of CPUs allocated to the container, one can either:
254262
- `exec` into uthe container and run `taskset -cp 1` (assuming this command is available in the container).
255263
- Call the `GetCPUS()` method of the `CPUProvider` interface in the `kubelet`'s [podresources API](https://pkg.go.dev/k8s.io/kubernetes/pkg/kubelet/apis/podresources#CPUsProvider).
256264

257-
Also, we can check `cpu_manager_numa_allocation_spread` metric.
265+
Also, we can check `cpu_manager_numa_allocation_spread` metric. We plan to add metric to track how CPUs are distributed across NUMA zones
266+
with labels/buckets representing NUMA nodes (numa_node=0, numa_node=1, ..., numa_node=N).
267+
268+
With packed allocation (default, option off), the distribution should mostly be in numa_node=1, with a small tail to numa_node=2 (and possibly higher)
269+
in cases of severe fragmentation. Users can compare this spread metric with the `container_aligned_compute_resources_count` metric to determine
270+
if they are getting aligned packed allocation or just packed allocation due to implementation details.
271+
272+
For example, if a node has 2 NUMA nodes and a pod requests 8 CPUs (with no other pods requesting exclusive CPUs on the node), the metric would look like this:
273+
274+
cpu_manager_numa_allocation_spread{numa_node="0"} = 8
275+
cpu_manager_numa_allocation_spread{numa_node="1"} = 0
276+
277+
278+
When the option is enabled, we would expect a more even distribution of CPUs across NUMA nodes, with no sharp peaks as seen with packed allocation.
279+
Users can also check the `container_aligned_compute_resources_count` metric to assess resource alignment and system behavior.
280+
281+
In this case, the metric would show:
282+
cpu_manager_numa_allocation_spread{numa_node="0"} = 4
283+
cpu_manager_numa_allocation_spread{numa_node="1"} = 4
284+
285+
286+
Note: This example is simplified to clearly highlight the difference between the two cases. Existing pods may slightly skew the counts, but the general
287+
trend of peaks and troughs will still provide a good indication of CPU distribution across NUMA nodes, allowing users to determine if the policy option
288+
is enabled or not.
258289

259290
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
260291

@@ -319,7 +350,10 @@ No impact. The behavior of the feature does not change when API Server and/or et
319350

320351
###### What are other known failure modes?
321352

322-
No known failure modes.
353+
Because of existing distribution of CPU resource across, a distributed allocation might not be possible. E.g. If all Available CPUs are present
354+
on the same NUMA node.
355+
356+
In that case we resort back to a best-effort fit of packing CPUs into NUMA nodes wherever they can fit.
323357

324358
###### What steps should be taken if SLOs are not being met to determine the problem?
325359

0 commit comments

Comments
 (0)