Skip to content

Commit 1db2891

Browse files
committed
Blog post: New conversion from cgroup v1 CPU shares to v2 CPU weight
Signed-off-by: Itamar Holder <[email protected]>
1 parent e768588 commit 1db2891

5 files changed

+179
-0
lines changed
59.4 KB
Loading
37 KB
Loading
39.1 KB
Loading
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
layout: blog
3+
title: 'New conversion from cgroup v1 CPU shares to v2 CPU weight'
4+
date: 2025-10-25T05:00:00-08:00
5+
draft: true
6+
slug: new-cgroup-v1-to-v2-cpu-conversion-formula
7+
author: >
8+
[Itamar Holder](https://github.com/iholder101) (Red Hat)
9+
---
10+
11+
I'm excited to announce the implementation of an improved conversion formula
12+
from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses
13+
critical issues with CPU priority allocation for Kubernetes workloads when
14+
running on systems with cgroup v2.
15+
16+
## Background
17+
18+
Kubernetes was originally designed with cgroup v1 in mind, where CPU shares
19+
were defined simply by assigning the container's CPU requests in millicpu
20+
form.
21+
22+
For example, a container requesting 1 CPU (1024m) would get
23+
`cpu.shares = 1024`.
24+
25+
After a while, cgroup v1 was stared being replaced by its successor,
26+
cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to
27+
262144, or from \( 2^1 \) to \(2^18\)) was replaced with CPU weight (which ranges from
28+
\([1, 10000]\), or \(10^0\) to \(10^4\).
29+
30+
With the transition to cgroup v2,
31+
[KEP-2254](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2)
32+
introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU
33+
weight. The conversion formula was defined as:
34+
35+
```
36+
cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142)
37+
```
38+
39+
<!-- convert from [2-262144] to [1-10000] -->
40+
41+
This formula linearly maps between \([2^1 - 2^18]\) to \([10^0 - 10^4]\).
42+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png)
43+
44+
While this approach is simple, the linear mapping imposes a few significant
45+
problems and impacts both performance and configuration granularity.
46+
47+
## Problems with Previous Conversion Formula
48+
49+
The current conversion formula creates two major issues:
50+
51+
### 1. Reduced Priority Against Non-Kubernetes Workloads
52+
53+
In cgroup v1, the default CPU shares is `1024`, meaning a container
54+
requesting 1 CPU has equal priority with system processes that live outside
55+
of Kubernetes' scope.
56+
However, in cgroup v2, the default CPU weight is `100`, but the current
57+
formula converts 1 CPU (1024m) to only `~39` weight - less than 40% of the
58+
default.
59+
60+
**Example:**
61+
- Container requesting 1 CPU (1024m)
62+
- cgroup v1: `cpu.shares = 1024` (equal to default)
63+
- cgroup v2 (current): `cpu.weight = 39` (much lower than default 100)
64+
65+
This means that after moving to cgroup v2, Kubernetes workloads would
66+
de-factor reduce their CPU priority against non-Kubernetes processes. The
67+
problem can be severe for setups with many system daemons that run
68+
outside of Kubernetes' scope and expect Kubernetes workloads to have
69+
priority, especially in situations of resource starvation.
70+
71+
### 2. Unmanageable Granularity
72+
73+
The current formula produces very low values for small CPU requests,
74+
limiting the ability to create sub-cgroups within containers for
75+
fine-grained resource distribution (which will possibly be much easier moving
76+
forward, see [KEP #5474](https://github.com/kubernetes/enhancements/issues/5474) for more info).
77+
78+
**Example:**
79+
- Container requesting 100m CPU
80+
- cgroup v1: `cpu.shares = 102`
81+
- cgroup v2 (current): `cpu.weight = 4` (too low for sub-cgroup
82+
configuration)
83+
84+
With cgroup v1, requesting 1 CPU which led to 102 CPU shares was manageable
85+
in the sense that sub-cgroups could have been created inside the main
86+
container, assigning fine-grained CPU priorities for different groups of
87+
processes. With cgroup v2 however, having 4 shares is very hard to
88+
distribute between sub-cgroups since it's not granular enough.
89+
90+
With plans to allow [writable cgroups for unprivileged containers](https://github.com/kubernetes/enhancements/issues/5474),
91+
this becomes even
92+
more relevant.
93+
94+
## New Conversion Formula
95+
96+
### Description
97+
The new formula is more complicated, but does a much better job mapping
98+
between cgroup v1 CPU shares and cgroup v2 CPU weight:
99+
100+
```
101+
cpu.weight = ⌈10^(L²/612 + 125L/612 - 7/34)⌉
102+
```
103+
104+
where `L = log₂(cpu.shares)`
105+
106+
The idea is that this is a quadratic function to cross the following values:
107+
- (2, 1): The minimum values for both ranges.
108+
- (1024, 100): The default values for both ranges.
109+
- (262144, 10000): The maximum values for both ranges.
110+
111+
Visually, the new function looks as follows:
112+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png)
113+
114+
And if we zoom in to the important part:
115+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png](2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png)
116+
117+
The new formula is "close to linear", yet it is sophistically designed to
118+
map the ranges in a clever way so the three important points above would
119+
cross.
120+
121+
### How It Solves the Problems
122+
123+
1. **Better Priority Alignment:**
124+
- Container requesting 1 CPU (1024m) will now get a `cpu.weight = 102`. This
125+
value is close to cgroup v2's default 100.
126+
- This restores the intended priority relationship between Kubernetes
127+
workloads and system processes.
128+
129+
2. **Improved Granularity:**
130+
- Container requesting 100m CPU will get `cpu.weight = 17`, (see
131+
[here](https://go.dev/play/p/sLlAfCg54Eg)).
132+
- Enables better fine-grained resource distribution within containers.
133+
134+
## Adoption and integration
135+
136+
This change was implemented as an OCI-level implementation.
137+
In other words, this is not implemented Kubernetes itself, therefore the
138+
adoption of the new conversion formula depends solely on the OCI runtime
139+
adoption.
140+
141+
For example:
142+
- runc: The new formula is enabled from [version 1.4.0-rc.1](https://github.com/opencontainers/runc/releases/tag/v1.4.0-rc.1),
143+
or [version 1.3.2](https://github.com/opencontainers/runc/releases/tag/v1.3.2).
144+
For containerd users, this version will probably be supported on Kubernetes v1.35+.
145+
- crun: The new formula is enabled from [version 1.23](https://github.com/containers/crun/releases/tag/1.23).
146+
For CRI-O users, this version is supported on Kubernetes v1.34+.
147+
148+
### Impact on Existing Deployments
149+
150+
**Important:** Some consumers may be affected if they assume the older linear conversion formula.
151+
Applications or monitoring tools that directly calculate expected CPU weight values based on the
152+
previous formula may need updates to account for the new quadratic conversion.
153+
This is particularly relevant for:
154+
155+
- Custom resource management tools that predict CPU weight values.
156+
- Monitoring systems that validate or expect specific weight values.
157+
- Applications that programmatically set or verify CPU weight values.
158+
159+
We recommend testing the new conversion formula in non-production environments before
160+
upgrading OCI runtimes to ensure compatibility with existing tooling.
161+
162+
## Where Can I Learn More?
163+
164+
For those interested in this enhancement:
165+
166+
- [Kubernetes GitHub Issue #131216](https://github.com/kubernetes/kubernetes/issues/131216) - Detailed technical
167+
analysis and examples, including discussions and reasoning for choosing the
168+
above formula.
169+
- [KEP-2254: cgroup v2](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2) -
170+
Original cgroup v2 implementation in Kubernetes.
171+
- [Kubernetes cgroup documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) -
172+
Current resource management guidance.
173+
174+
## How Do I Get Involved?
175+
176+
For those interested in getting involved with Kubernetes node-level
177+
features, join the [Kubernetes Node Special Interest Group](https://github.com/kubernetes/community/tree/master/sig-node).
178+
We always welcome new contributors and diverse perspectives on resource management
179+
challenges.

linear-conversion.png

59.4 KB
Loading

0 commit comments

Comments
 (0)