Skip to content

Commit 5ad59ff

Browse files
committed
Blog post: New conversion from cgroup v1 CPU shares to v2 CPU weight
Signed-off-by: Itamar Holder <iholder@redhat.com>
1 parent 2300ce5 commit 5ad59ff

5 files changed

+161
-0
lines changed
59.4 KB
Loading
37 KB
Loading
39.1 KB
Loading
Lines changed: 161 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,161 @@
1+
---
2+
layout: blog
3+
title: 'New conversion from cgroup v1 CPU shares to v2 CPU weight'
4+
date: 2025-10-25T05:00:00-08:00
5+
slug: new-cgroup-v1-to-v2-cpu-conversion-formula
6+
author: >
7+
[Itamar Holder](https://github.com/iholder101) (Red Hat)
8+
---
9+
10+
We're excited to announce the implementation of an improved conversion formula
11+
from cgroup v1 CPU shares to cgroup v2 CPU weight. This enhancement addresses
12+
critical issues with CPU priority allocation for Kubernetes workloads when
13+
running on systems with cgroup v2.
14+
15+
## Background
16+
17+
Kubernetes was originally designed with cgroup v1 in mind, where CPU shares
18+
were defined simply by assigning the container's CPU requests in millicpu
19+
form.
20+
21+
For example, a container requesting 1 CPU (1024m) would get
22+
`cpu.shares = 1024`.
23+
24+
After a while, cgroup v1 was stared being replaced by its successor,
25+
cgroup v2. In cgroup v2, the concept of CPU shares (which ranges from 2 to
26+
262144, or from 2^1 to 2^18) was replaced with CPU weight (which ranges from
27+
1 to 10000, or 10^10 to 10^4).
28+
29+
With the transition to cgroup v2,
30+
[KEP-2254](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2)
31+
introduced a conversion formula to map cgroup v1 CPU shares to cgroup v2 CPU
32+
weight. The conversion formula is defined as:
33+
34+
`cpu.weight = (1 + ((cpu.shares - 2) * 9999) / 262142) // convert from [2-262144] to [1-10000]`.
35+
36+
This formula linearly maps between `[2^1 - 2^18]` to `[10^0 - 10^4]`.
37+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png]
38+
(2025-10-25-new-cgroup-v1-to-v2-conversion-formula-linear-conversion.png)
39+
40+
While this approach is simple, the linear mapping imposes a few significant
41+
problems and impacts both performance and configuration granularity.
42+
43+
## Problems with Current Conversion Formula
44+
45+
The current conversion formula creates two major issues:
46+
47+
### 1. Reduced Priority Against Non-Kubernetes Workloads
48+
49+
In cgroup v1, the default CPU shares is `1024`, meaning a container
50+
requesting 1 CPU has equal priority with system processes that live outside
51+
of Kubernetes' scope.
52+
However, in cgroup v2, the default CPU weight is `100`, but the current
53+
formula converts 1 CPU (1024m) to only `~39` weight - less than 40% of the
54+
default.
55+
56+
**Example:**
57+
- Container requesting 1 CPU (1024m)
58+
- cgroup v1: `cpu.shares = 1024` (equal to default)
59+
- cgroup v2 (current): `cpu.weight = 39` (much lower than default 100)
60+
61+
This means that after moving to cgroup v2, Kubernetes workloads would
62+
de-factor reduce their CPU priority against non-Kubernetes processes. The
63+
problem can be severe for setups that run many system daemons that run
64+
outside of Kubernetes' scope and expect Kubernetes workloads to have
65+
priority, especially in situations of resource starvation.
66+
67+
### 2. Unmanageable Granularity
68+
69+
The current formula produces very low values for small CPU requests,
70+
limiting the ability to create sub-cgroups within containers for
71+
fine-grained resource distribution.
72+
73+
**Example:**
74+
- Container requesting 100m CPU
75+
- cgroup v1: `cpu.shares = 102`
76+
- cgroup v2 (current): `cpu.weight = 4` (too low for sub-cgroup
77+
configuration)
78+
79+
With cgroup v1, requesting 1 CPU which led to 102 CPU shares was manageable
80+
in the sense that sub-cgroups could have been created inside the main
81+
container, assigning fine-grained CPU priorities for different groups of
82+
processes. With cgroup v2 however, having 4 shares is very hard to
83+
distribute between sub-cgroups since it's not granular enough.
84+
85+
With plans to allow [writable cgroups for unprivileged containers]
86+
(https://github.com/kubernetes/enhancements/issues/5474), this becomes even
87+
more relevant.
88+
89+
## New Conversion Formula
90+
91+
### Description
92+
The new formula is more complicated, but does a much better job mapping
93+
between cgroup v1 CPU shares and cgroup v2 CPU weight:
94+
95+
`cpu.weight = ⌈10^(L²/612 + 125L/612 - 7/34)⌉` where `L = log₂(cpu.shares)`.
96+
97+
The idea is that this is a quadratic function to cross the following values:
98+
- (2, 1): The minimum values for both ranges.
99+
- (1024, 100): The default values for both ranges.
100+
- (262144, 10000): The maximum values for both ranges.
101+
102+
Visually, the new function looks as follows:
103+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png]
104+
(2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion.png)
105+
106+
And if we zoom in to the important part:
107+
![2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png]
108+
(2025-10-25-new-cgroup-v1-to-v2-conversion-formula-new-conversion-zoom.png)
109+
110+
The new formula is "close to linear", yet it is sophistically designed to
111+
map the ranges in a clever way so the three important points above would
112+
cross.
113+
114+
### How It Solves the Problems
115+
116+
**1. Better Priority Alignment:**
117+
- Container requesting 1 CPU (1024m) will now get a `cpu.weight = 102`. This
118+
value is close to cgroup v2's default 100.
119+
- This restores the intended priority relationship between Kubernetes
120+
workloads and system processes.
121+
122+
**2. Improved Granularity:**
123+
- Container requesting 100m CPU will get `cpu.weight = 17`, (see
124+
[here](https://go.dev/play/p/sLlAfCg54Eg)).
125+
- Enables better fine-grained resource distribution within containers.
126+
127+
## Adoption and integration
128+
129+
This change was implemented as an OCI-level implementation.
130+
In other words, this is not implemented Kubernetes itself, therefore the
131+
adoption of the new conversion formula depends solely on the OCI runtime
132+
adoption.
133+
134+
For example:
135+
- runc: The new formula is enabled from [version 1.4.0-rc.1]
136+
(https://github.com/opencontainers/runc/releases/tag/v1.4.0-rc.1).
137+
- crun: The new formula is enabled from [version 1.23]
138+
(https://github.com/containers/crun/releases/tag/1.23).
139+
140+
## Where Can I Learn More?
141+
142+
For those interested in this enhancement:
143+
144+
- [Kubernetes GitHub Issue #131216]
145+
(https://github.com/kubernetes/kubernetes/issues/131216) - Detailed technical
146+
analysis and examples, including discussions and reasoning for choosing the
147+
above formula.
148+
- [KEP-2254: cgroup v2]
149+
(https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2254-cgroup-v2) -
150+
Original cgroup v2 implementation in Kubernetes.
151+
- [Kubernetes cgroup documentation]
152+
(https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/) -
153+
Current resource management guidance.
154+
155+
## How Do I Get Involved?
156+
157+
For those interested in getting involved with Kubernetes node-level
158+
features, join the [Kubernetes Node Special Interest Group]
159+
(https://github.com/kubernetes/community/tree/master/sig-node). We always
160+
welcome new contributors and diverse perspectives on resource management
161+
challenges.

linear-conversion.png

59.4 KB
Loading

0 commit comments

Comments
 (0)