Skip to content

Commit 08c5377

Browse files
authored
Merge pull request #45354 from kinvolk/rata/blog-userns-1.30
Feature blog 1.30: user namespaces
2 parents 085b35c + 052cf7e commit 08c5377

File tree

3 files changed

+157
-0
lines changed

3 files changed

+157
-0
lines changed
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.30: Beta Support For Pods With User Namespaces"
4+
date: 2024-04-22
5+
slug: userns-beta
6+
---
7+
8+
**Authors:** Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat)
9+
10+
Linux provides different namespaces to isolate processes from each other. For
11+
example, a typical Kubernetes pod runs within a network namespace to isolate the
12+
network identity and a PID namespace to isolate the processes.
13+
14+
One Linux namespace that was left behind is the [user
15+
namespace](https://man7.org/linux/man-pages/man7/user_namespaces.7.html). This
16+
namespace allows us to isolate the user and group identifiers (UIDs and GIDs) we
17+
use inside the container from the ones on the host.
18+
19+
This is a powerful abstraction that allows us to run containers as "root": we
20+
are root inside the container and can do everything root can inside the pod,
21+
but our interactions with the host are limited to what a non-privileged user can
22+
do. This is great for limiting the impact of a container breakout.
23+
24+
A container breakout is when a process inside a container can break out
25+
onto the host using some unpatched vulnerability in the container runtime or the
26+
kernel and can access/modify files on the host or other containers. If we
27+
run our pods with user namespaces, the privileges the container has over the
28+
rest of the host are reduced, and the files outside the container it can access
29+
are limited too.
30+
31+
In Kubernetes v1.25, we introduced support for user namespaces only for stateless
32+
pods. Kubernetes 1.28 lifted that restriction, and now, with Kubernetes 1.30, we
33+
are moving to beta!
34+
35+
## What is a user namespace?
36+
37+
Note: Linux user namespaces are a different concept from [Kubernetes
38+
namespaces](/docs/concepts/overview/working-with-objects/namespaces/).
39+
The former is a Linux kernel feature; the latter is a Kubernetes feature.
40+
41+
User namespaces are a Linux feature that isolates the UIDs and GIDs of the
42+
containers from the ones on the host. The identifiers in the container can be
43+
mapped to identifiers on the host in a way where the host UID/GIDs used for
44+
different containers never overlap. Furthermore, the identifiers can be mapped
45+
to unprivileged, non-overlapping UIDs and GIDs on the host. This brings two key
46+
benefits:
47+
48+
* _Prevention of lateral movement_: As the UIDs and GIDs for different
49+
containers are mapped to different UIDs and GIDs on the host, containers have a
50+
harder time attacking each other, even if they escape the container boundaries.
51+
For example, suppose container A runs with different UIDs and GIDs on the host
52+
than container B. In that case, the operations it can do on container B's files and processes
53+
are limited: only read/write what a file allows to others, as it will never
54+
have permission owner or group permission (the UIDs/GIDs on the host are
55+
guaranteed to be different for different containers).
56+
57+
* _Increased host isolation_: As the UIDs and GIDs are mapped to unprivileged
58+
users on the host, if a container escapes the container boundaries, even if it
59+
runs as root inside the container, it has no privileges on the host. This
60+
greatly protects what host files it can read/write, which process it can send
61+
signals to, etc. Furthermore, capabilities granted are only valid inside the
62+
user namespace and not on the host, limiting the impact a container
63+
escape can have.
64+
65+
{{< figure src="/images/blog/2024-04-22-userns-beta/userns-ids.png" alt="Image showing IDs 0-65535 are reserved to the host, pods use higher IDs" title="User namespace IDs allocation" >}}
66+
67+
68+
Without using a user namespace, a container running as root in the case of a
69+
container breakout has root privileges on the node. If some capabilities
70+
were granted to the container, the capabilities are valid on the host too. None
71+
of this is true when using user namespaces (modulo bugs, of course 🙂).
72+
73+
## Changes in 1.30
74+
75+
In Kubernetes 1.30, besides moving user namespaces to beta, the contributors
76+
working on this feature:
77+
78+
* Introduced a way for the kubelet to use custom ranges for the UIDs/GIDs mapping
79+
* Have added a way for Kubernetes to enforce that the runtime supports all the features
80+
needed for user namespaces. If they are not supported, Kubernetes will show a
81+
clear error when trying to create a pod with user namespaces. Before 1.30, if
82+
the container runtime didn't support user namespaces, the pod could be created
83+
without a user namespace.
84+
* Added more tests, including [tests in the
85+
cri-tools](https://github.com/kubernetes-sigs/cri-tools/pull/1354)
86+
repository.
87+
88+
You can check the
89+
[documentation](/docs/concepts/workloads/pods/user-namespaces/#set-up-a-node-to-support-user-namespaces)
90+
on user namespaces for how to configure custom ranges for the mapping.
91+
92+
## Demo
93+
94+
A few months ago, [CVE-2024-21626][runc-cve] was disclosed. This **vulnerability
95+
score is 8.6 (HIGH)**. It allows an attacker to escape a container and
96+
**read/write to any path on the node and other pods hosted on the same node**.
97+
98+
Rodrigo created a demo that exploits [CVE 2024-21626][runc-cve] and shows how
99+
the exploit, which works without user namespaces, **is mitigated when user
100+
namespaces are in use.**
101+
102+
{{< youtube id="07y5bl5UDdA" title="Mitigation of CVE-2024-21626 on Kubernetes by enabling User Namespace support" class="youtube-quote-sm" >}}
103+
104+
Please note that with user namespaces, an attacker can do on the host file system
105+
what the permission bits for "others" allow. Therefore, the CVE is not
106+
completely prevented, but the impact is greatly reduced.
107+
108+
[runc-cve]: https://github.com/opencontainers/runc/security/advisories/GHSA-xr7r-f8xq-vfvv
109+
110+
## Node system requirements
111+
112+
There are requirements on the Linux kernel version and the container
113+
runtime to use this feature.
114+
115+
On Linux you need Linux 6.3 or greater. This is because the feature relies on a
116+
kernel feature named idmap mounts, and support for using idmap mounts with tmpfs
117+
was merged in Linux 6.3.
118+
119+
Suppose you are using [CRI-O][crio] with crun; as always, you can expect support for
120+
Kubernetes 1.30 with CRI-O 1.30. Please note you also need [crun][crun] 1.9 or
121+
greater. If you are using CRI-O with [runc][runc], this is still not supported.
122+
123+
Containerd support is currently targeted for [containerd][containerd] 2.0, and
124+
the same crun version requirements apply. If you are using containerd with runc,
125+
this is still not supported.
126+
127+
Please note that containerd 1.7 added _experimental_ support for user
128+
namespaces, as implemented in Kubernetes 1.25 and 1.26. We did a redesign in
129+
Kubernetes 1.27, which requires changes in the container runtime. Those changes
130+
are not present in containerd 1.7, so it only works with user namespaces
131+
support in Kubernetes 1.25 and 1.26.
132+
133+
Another limitation of containerd 1.7 is that it needs to change the
134+
ownership of every file and directory inside the container image during Pod
135+
startup. This has a storage overhead and can significantly impact the
136+
container startup latency. Containerd 2.0 will probably include an implementation
137+
that will eliminate the added startup latency and storage overhead. Consider
138+
this if you plan to use containerd 1.7 with user namespaces in
139+
production.
140+
141+
None of these containerd 1.7 limitations apply to CRI-O.
142+
143+
[crio]: https://cri-o.io/
144+
[crun]: https://github.com/containers/crun
145+
[runc]: https://github.com/opencontainers/runc/
146+
[containerd]: https://containerd.io/
147+
148+
## How do I get involved?
149+
150+
You can reach SIG Node by several means:
151+
- Slack: [#sig-node](https://kubernetes.slack.com/messages/sig-node)
152+
- [Mailing list](https://groups.google.com/forum/#!forum/kubernetes-sig-node)
153+
- [Open Community Issues/PRs](https://github.com/kubernetes/community/labels/sig%2Fnode)
154+
155+
You can also contact us directly:
156+
- GitHub: @rata @giuseppe @saschagrunert
157+
- Slack: @rata @giuseppe @sascha
Binary file not shown.
43.2 KB
Loading

0 commit comments

Comments
 (0)