topology-aware: allow slicing idle shared pools empty. #602

klihub · 2025-12-03T10:27:46Z

Note: this PR is stacked on #601, which should be reviewed first.

This PR updates the topology-aware policy accounting and allocation algorithm to allow slicing an idle shared CPU pool empty for exclusive CPU allocations.

test/e2e/policies.test-suite/topology-aware/n4c16/test16-idle-shared-pools/code.var.sh

askervin

I ran one more test in my environment, with the purpose of validating a case where burstable containers are assigned to socket-level shared pool instead of leaf (NUMA)nodes, and then guaranteed containers flowing in.

CONTCOUNT=2 CPUREQ=3000m CPULIM=6000m create burstable
CPU=4 MEM=100M CONTCOUNT=2 create guaranteed
report allowed
verify 'disjoint_sets(cpus["pod0c0"],cpus["pod0c1"],cpus["pod1c0"],cpus["pod1c1"])'

and this works as expected. Finally, pushed this over the limit by changing the guaranteed pod to: CPU=3 MEM=100M CONTCOUNT=3 create guaranteed, which I first thought would succeed, but then again, the last container in this pod would empty a socket-level shared pool that runs a burstable container, so in the end, it was expected to fail.

A reason I wanted to mention these tests is that if you find the passing test would truly add value to capture regression in the future, let's have it, too. But if not, then I'm happy to merge #601 and #602 as is.

In other words, LGTM.

Thanks for these patches @klihub! I think this is better than just a lipstick. :)

askervin · 2025-12-05T13:59:39Z

On top of these PRs, the e2e test in #599 fails, yet on top of #598 it passed (most likely due to reserved shared pool CPUs getting empty).

The reason is that kube-proxy, that is the only besteffort reserved pod that is running in our initial setup, gets assigned to the shared root pool instead of the reserved pool (cpuset:1535) like all the other reserved containers. And therefore allocating 5 exclusive CPUs fails --- as it should.

D: [              policy              ] <post-alloc><virtual root>
D: [              policy              ] <post-alloc>  - <root capacity: CPU: reserved:1535 (1000m), sharable:0-2,511,4095 (5000m), MemLimit: 13.65G>
D: [              policy              ] <post-alloc>  - <root allocatable: CPU: reserved:1535 (allocatable: 101m), grantedReserved:899m, sharable:0-2,511,4095 (allocatable:4999m)/sliceable:0-2,511,4095 (5000m), MemLimit: 13.52G>
D: [              policy              ] <post-alloc>  - normal memory: 0,2,7
D: [              policy              ] <post-alloc>  - PMEM memory: 1,3,4,5,6reserved
D: [              policy              ] <post-alloc>    + <grant for kube-system/kube-apiserver-s8c4k-fedora-42-containerd/kube-apiserver from root: cputype: reserved, reserved: 1535 (250m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/kube-scheduler-s8c4k-fedora-42-containerd/kube-scheduler from root: cputype: reserved, reserved: 1535 (100m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/kube-controller-manager-s8c4k-fedora-42-containerd/kube-controller-manager from root: cputype: reserved, reserved: 1535 (199m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/etcd-s8c4k-fedora-42-containerd/etcd from root: cputype: reserved, reserved: 1535 (100m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/kube-proxy-7gjs6/kube-proxy from root: **cputype: reserved, shared: 0-2,511,4095** (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/coredns-66bc5c9577-4v4mt/coredns from root: cputype: reserved, reserved: 1535 (100m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (69.90M)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/coredns-66bc5c9577-dq845/coredns from root: cputype: reserved, reserved: 1535 (100m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (69.90M)>
D: [              policy              ] <post-alloc>    + <grant for kube-system/nri-resource-policy-topology-aware-7g4hb/nri-resource-policy-topology-aware from root: cputype: reserved, reserved: 1535 (50m), shared: 0-2,511,4095 (0m), memory: nodes{0-7} (0.00)>
D: [              policy              ] <post-alloc>  - children:
D: [              policy              ] <post-alloc>    <socket #0>
D: [              policy              ] <post-alloc>      - <socket #0 capacity: CPU: sharable:0-2,511 (4000m), MemLimit: 10.23G>
D: [              policy              ] <post-alloc>      - <socket #0 allocatable: CPU: sharable:0-2,511 (allocatable:4000m)/sliceable:0-2,511 (4000m), MemLimit: 10.23G>
D: [              policy              ] <post-alloc>      - normal memory: 0
D: [              policy              ] <post-alloc>      - PMEM memory: 1,3,4,5,6
D: [              policy              ] <post-alloc>      - parent: <root>
D: [              policy              ] <post-alloc>    <socket #2>
D: [              policy              ] <post-alloc>      - <socket #2 capacity: CPU: reserved:1535 (1000m), MemLimit: 10.73G>
D: [              policy              ] <post-alloc>      - <socket #2 allocatable: CPU: reserved:1535 (allocatable: 101m), MemLimit: 10.73G>
D: [              policy              ] <post-alloc>      - normal memory: 2
D: [              policy              ] <post-alloc>      - PMEM memory: 1,3,4,5,6
D: [              policy              ] <post-alloc>      - parent: <root>
D: [              policy              ] <post-alloc>    <socket #7>
D: [              policy              ] <post-alloc>      - <socket #7 capacity: CPU: sharable:4095 (1000m), MemLimit: 10.56G>
D: [              policy              ] <post-alloc>      - <socket #7 allocatable: CPU: sharable:4095 (allocatable:1000m)/sliceable:4095 (1000m), MemLimit: 10.56G>
D: [              policy              ] <post-alloc>      - normal memory: 7
D: [              policy              ] <post-alloc>      - PMEM memory: 1,3,4,5,6
D: [              policy              ] <post-alloc>      - parent: <root>

Giving a CPU request, for instance 10m, to kube-proxy would solve the problem. Then it runs on the reserved CPU, too, and allocation of all 5 free CPUs works fine.

Probably I'll just need to modify the test... unless we want to run besteffort reserved containers on reserved CPUs, too.

klihub · 2025-12-05T19:25:14Z

Probably I'll just need to modify the test... unless we want to run besteffort reserved containers on reserved CPUs, too.

@askervin I think we want to do that and I thought that we already do. I'll try to check why it does not end up there...

askervin · 2025-12-12T06:21:42Z

@klihub, I merged #601, but it seems I can't cleanly edit/merge this PR so that github would realize stacked commits are already there.

(I don't dare to try what "web editor" conflict resolution would look like... I'm afraid it might create duplicates on already merged stacked commits with very questionable-looking changes in them....)

But I think we could merge this, once it can be done cleanly, and handle the issue of besteffort-reserved container not going to reserved CPUs separately.

klihub · 2025-12-12T06:43:35Z

@klihub, I merged #601, but it seems I can't cleanly edit/merge this PR so that github would realize stacked commits are already there.

(I don't dare to try what "web editor" conflict resolution would look like... I'm afraid it might create duplicates on already merged stacked commits with very questionable-looking changes in them....)

But I think we could merge this, once it can be done cleanly, and handle the issue of besteffort-reserved container not going to reserved CPUs separately.

@askervin Thank ! Just gimme a sec and I rebase.

Allow slicing idle shared pools empty for exclusive allocations. Signed-off-by: Krisztian Litkey <[email protected]>

Signed-off-by: Krisztian Litkey <[email protected]>

marquiz

LGTM

klihub force-pushed the devel/allow-slicing-idle-pools-empty branch from 40ee962 to 93489be Compare December 3, 2025 10:35

klihub mentioned this pull request Dec 3, 2025

topology-aware: avoid slicing busy shared pools empty #601

Merged

askervin reviewed Dec 3, 2025

View reviewed changes

test/e2e/policies.test-suite/topology-aware/n4c16/test16-idle-shared-pools/code.var.sh Show resolved Hide resolved

klihub force-pushed the devel/allow-slicing-idle-pools-empty branch from 93489be to 5f734aa Compare December 4, 2025 07:33

klihub requested review from askervin and fmuyassarov December 4, 2025 07:35

klihub mentioned this pull request Dec 4, 2025

topology-aware: allow slicing 5 CPUs when 5 CPUs available #598

Closed

klihub marked this pull request as ready for review December 5, 2025 08:51

askervin approved these changes Dec 5, 2025

View reviewed changes

klihub added 2 commits December 12, 2025 08:58

topology-aware: allow exhausting idle shared pools.

e2ecfb6

Allow slicing idle shared pools empty for exclusive allocations. Signed-off-by: Krisztian Litkey <[email protected]>

e2e: add test for exhausting idle shared pools.

f0b7b26

Signed-off-by: Krisztian Litkey <[email protected]>

klihub force-pushed the devel/allow-slicing-idle-pools-empty branch from 5f734aa to f0b7b26 Compare December 12, 2025 06:59

marquiz approved these changes Dec 12, 2025

View reviewed changes

askervin merged commit f8824ba into containers:main Dec 12, 2025
9 checks passed

klihub deleted the devel/allow-slicing-idle-pools-empty branch December 16, 2025 11:15

This was referenced Dec 16, 2025

topology-aware: don't (ac)count 0 CPU req. containers in the reserved pool to shared pools #604

Merged

e2e: add 8-socket 4k-CPU test for topology-aware #599

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

topology-aware: allow slicing idle shared pools empty. #602

topology-aware: allow slicing idle shared pools empty. #602

Uh oh!

klihub commented Dec 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

askervin left a comment

Uh oh!

askervin commented Dec 5, 2025

Uh oh!

klihub commented Dec 5, 2025 •

edited

Loading

Uh oh!

askervin commented Dec 12, 2025 •

edited

Loading

Uh oh!

klihub commented Dec 12, 2025

Uh oh!

marquiz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

topology-aware: allow slicing idle shared pools empty. #602

topology-aware: allow slicing idle shared pools empty. #602

Uh oh!

Conversation

klihub commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

askervin left a comment

Choose a reason for hiding this comment

Uh oh!

askervin commented Dec 5, 2025

Uh oh!

klihub commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

askervin commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

klihub commented Dec 12, 2025

Uh oh!

marquiz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

klihub commented Dec 3, 2025 •

edited

Loading

klihub commented Dec 5, 2025 •

edited

Loading

askervin commented Dec 12, 2025 •

edited

Loading