Skip to content

Commit c7cd6c5

Browse files
authored
Merge pull request #45178 from kinvolk/rata/userns-1.30
User namespaces doc changes for 1.30
2 parents 753073b + 69b9e71 commit c7cd6c5

File tree

3 files changed

+108
-27
lines changed

3 files changed

+108
-27
lines changed

content/en/docs/concepts/workloads/pods/user-namespaces.md

Lines changed: 78 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
77
---
88

99
<!-- overview -->
10-
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
10+
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
1111

1212
This page explains how user namespaces are used in Kubernetes pods. A user
1313
namespace isolates the user running inside the container from the one
@@ -46,7 +46,26 @@ tmpfs, Secrets use a tmpfs, etc.)
4646
Some popular filesystems that support idmap mounts in Linux 6.3 are: btrfs,
4747
ext4, xfs, fat, tmpfs, overlayfs.
4848

49-
In addition, support is needed in the
49+
In addition, the container runtime and its underlying OCI runtime must support
50+
user namespaces. The following OCI runtimes offer support:
51+
52+
* [crun](https://github.com/containers/crun) version 1.9 or greater (it's recommend version 1.13+).
53+
54+
<!-- ideally, update this if a newer minor release of runc comes out, whether or not it includes the idmap support -->
55+
{{< note >}}
56+
Many OCI runtimes do not include the support needed for using user namespaces in
57+
Linux pods. If you use a managed Kubernetes, or have downloaded it from packages
58+
and set it up, it's likely that nodes in your cluster use a runtime that doesn't
59+
include this support. For example, the most widely used OCI runtime is `runc`,
60+
and version `1.1.z` of runc doesn't support all the features needed by the
61+
Kubernetes implementation of user namespaces.
62+
63+
If there is a newer release of runc than 1.1 available for use, check its
64+
documentation and release notes for compatibility (look for idmap mounts support
65+
in particular, because that is the missing feature).
66+
{{< /note >}}
67+
68+
To use user namespaces with Kubernetes, you also need to use a CRI
5069
{{< glossary_tooltip text="container runtime" term_id="container-runtime" >}}
5170
to use this feature with Kubernetes pods:
5271

@@ -137,20 +156,67 @@ use, see `man 7 user_namespaces`.
137156

138157
## Set up a node to support user namespaces
139158

140-
It is recommended that the host's files and host's processes use UIDs/GIDs in
141-
the range of 0-65535.
159+
By default, the kubelet assigns pods UIDs/GIDs above the range 0-65535, based on
160+
the assumption that the host's files and processes use UIDs/GIDs within this
161+
range, which is standard for most Linux distributions. This approach prevents
162+
any overlap between the UIDs/GIDs of the host and those of the pods.
163+
164+
Avoiding the overlap is important to mitigate the impact of vulnerabilities such
165+
as [CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
166+
files in the host. If the UIDs/GIDs of the pod and the host don't overlap, it is
167+
limited what a pod would be able to do: the pod UID/GID won't match the host's
168+
file owner/group.
169+
170+
The kubelet can use a custom range for user IDs and group IDs for pods. To
171+
configure a custom range, the node needs to have:
172+
173+
* A user `kubelet` in the system (you cannot use any other username here)
174+
* The binary `getsubids` installed (part of [shadow-utils][shadow-utils]) and
175+
in the `PATH` for the kubelet binary.
176+
* A configuration of subordinate UIDs/GIDs for the `kubelet` user (see
177+
[`man 5 subuid`](https://man7.org/linux/man-pages/man5/subuid.5.html) and
178+
[`man 5 subgid`](https://man7.org/linux/man-pages/man5/subgid.5.html)).
179+
180+
This setting only gathers the UID/GID range configuration and does not change
181+
the user executing the `kubelet`.
182+
183+
You must follow some constraints for the subordinate ID range that you assign
184+
to the `kubelet` user:
185+
186+
* The subordinate user ID, that starts the UID range for Pods, **must** be a
187+
multiple of 65536 and must also be greater than or equal to 65536. In other
188+
words, you cannot use any ID from the range 0-65535 for Pods; the kubelet
189+
imposes this restriction to make it difficult to create an accidentally insecure
190+
configuration.
191+
192+
* The subordinate ID count must be a multiple of 65536
193+
194+
* The subordinate ID count must be at least `65536 x <maxPods>` where `<maxPods>`
195+
is the maximum number of pods that can run on the node.
196+
197+
* You must assign the same range for both user IDs and for group IDs, It doesn't
198+
matter if other users have user ID ranges that don't align with the group ID
199+
ranges.
200+
201+
* None of the assigned ranges should overlap with any other assignment.
202+
203+
* The subordinate configuration must be only one line. In other words, you can't
204+
have multiple ranges.
142205

143-
The kubelet will assign UIDs/GIDs higher than that to pods. Therefore, to
144-
guarantee as much isolation as possible, the UIDs/GIDs used by the host's files
145-
and host's processes should be in the range 0-65535.
206+
For example, you could define `/etc/subuid` and `/etc/subgid` to both have
207+
these entries for the `kubelet` user:
146208

147-
Note that this recommendation is important to mitigate the impact of CVEs like
148-
[CVE-2021-25741][CVE-2021-25741], where a pod can potentially read arbitrary
149-
files in the hosts. If the UIDs/GIDs of the pod and the host don't overlap, it
150-
is limited what a pod would be able to do: the pod UID/GID won't match the
151-
host's file owner/group.
209+
```
210+
# The format is
211+
# name:firstID:count of IDs
212+
# where
213+
# - firstID is 65536 (the minimum value possible)
214+
# - count of IDs is 110 (default limit for number of) * 65536
215+
kubelet:65536:7208960
216+
```
152217

153218
[CVE-2021-25741]: https://github.com/kubernetes/kubernetes/issues/104980
219+
[shadow-utils]: https://github.com/shadow-maint/shadow
154220

155221
## Integration with Pod security admission checks
156222

content/en/docs/reference/command-line-tools-reference/feature-gates/user-namespaces-support.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,12 @@ _build:
66
render: false
77

88
stages:
9-
- stage: alpha
9+
- stage: alpha
1010
defaultValue: false
1111
fromVersion: "1.28"
12+
toVersion: "1.29"
13+
- stage: beta
14+
defaultValue: false
15+
fromVersion: "1.30"
1216
---
1317
Enable user namespace support for Pods.

content/en/docs/tasks/configure-pod-container/user-namespaces.md

Lines changed: 25 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ min-kubernetes-server-version: v1.25
77
---
88

99
<!-- overview -->
10-
{{< feature-state for_k8s_version="v1.25" state="alpha" >}}
10+
{{< feature-state for_k8s_version="v1.30" state="beta" >}}
1111

1212
This page shows how to configure a user namespace for pods. This allows you to
1313
isolate the user running inside the container from the one in the host.
@@ -57,10 +57,6 @@ If you have a mixture of nodes and only some of the nodes provide user namespace
5757
Pods, you also need to ensure that the user namespace Pods are
5858
[scheduled](/docs/concepts/scheduling-eviction/assign-pod-node/) to suitable nodes.
5959

60-
Please note that **if your container runtime doesn't support user namespaces, the
61-
`hostUsers` field in the pod spec will be silently ignored and the pod will be
62-
created without user namespaces.**
63-
6460
<!-- steps -->
6561

6662
## Run a Pod that uses a user namespace {#create-pod}
@@ -82,27 +78,42 @@ to `false`. For example:
8278
kubectl attach -it userns bash
8379
```
8480

85-
And run the command. The output is similar to this:
81+
Run this command:
8682

87-
```none
83+
```shell
8884
readlink /proc/self/ns/user
85+
```
86+
87+
The output is similar to:
88+
89+
```shell
8990
user:[4026531837]
91+
```
92+
93+
Also run:
94+
95+
```shell
9096
cat /proc/self/uid_map
91-
0 0 4294967295
9297
```
9398

94-
Then, open a shell in the host and run the same command.
99+
The output is similar to:
100+
```shell
101+
0 833617920 65536
102+
```
103+
104+
Then, open a shell in the host and run the same commands.
105+
106+
The `readlink` command shows the user namespace the process is running in. It
107+
should be different when it is run on the host and inside the container.
95108

96-
The output must be different. This means the host and the pod are using a
97-
different user namespace. When user namespaces are not enabled, the host and the
98-
pod use the same user namespace.
109+
The last number of the `uid_map` file inside the container must be 65536, on the
110+
host it must be a bigger number.
99111

100112
If you are running the kubelet inside a user namespace, you need to compare the
101113
output from running the command in the pod to the output of running in the host:
102114

103-
```none
115+
```shell
104116
readlink /proc/$pid/ns/user
105-
user:[4026534732]
106117
```
107118

108119
replacing `$pid` with the kubelet PID.

0 commit comments

Comments
 (0)