Skip to content

Commit 2860fbf

Browse files
committed
Add KubeletInUserNamespace feature gate
Enables support for running kubelet in a user namespace. The user namespace has to be created before running kubelet. All the node components such as CRI need to be running in the same user namespace. - Tracking issue: kubernetes/enhancements issue 2033 - KEP: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless - Implementation: kubernetes/kubernetes PR 92863 Signed-off-by: Akihiro Suda <[email protected]>
1 parent cfce358 commit 2860fbf

File tree

3 files changed

+283
-0
lines changed

3 files changed

+283
-0
lines changed

content/en/docs/reference/command-line-tools-reference/feature-gates.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -138,6 +138,7 @@ different Kubernetes components.
138138
| `LocalStorageCapacityIsolationFSQuotaMonitoring` | `false` | Alpha | 1.15 | |
139139
| `LogarithmicScaleDown` | `false` | Alpha | 1.21 | |
140140
| `LogarithmicScaleDown` | `true` | Beta | 1.22 | |
141+
| `KubeletInUserNamespace` | `false` | Alpha | 1.22 | |
141142
| `KubeletPodResourcesGetAllocatable` | `false` | Alpha | 1.21 | |
142143
| `MemoryManager` | `false` | Alpha | 1.21 | 1.21 |
143144
| `MemoryManager` | `true` | Beta | 1.22 | |
@@ -785,6 +786,8 @@ Each feature gate is designed for enabling/disabling a specific feature:
785786
See [setting kubelet parameters via a config file](/docs/tasks/administer-cluster/kubelet-config-file/)
786787
for more details.
787788
- `KubeletCredentialProviders`: Enable kubelet exec credential providers for image pull credentials.
789+
- `KubeletInUserNamespace`: Enables support for running kubelet in a {{<glossary_tooltip text="user namespace" term_id="userns">}}.
790+
See [Running Kubernetes Node Components as a Non-root User](/docs/tasks/administer-cluster/kubelet-in-userns/).
788791
- `KubeletPluginsWatcher`: Enable probe-based plugin watcher utility to enable kubelet
789792
to discover plugins such as [CSI volume drivers](/docs/concepts/storage/volumes/#csi).
790793
- `KubeletPodResources`: Enable the kubelet's pod resources gRPC endpoint. See
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: user namespace
3+
id: userns
4+
date: 2021-07-13
5+
full_link: https://man7.org/linux/man-pages/man7/user_namespaces.7.html
6+
short_description: >
7+
A Linux kernel feature to emulate superuser privilege for unprivileged users.
8+
9+
aka:
10+
tags:
11+
- security
12+
---
13+
14+
A kernel feature to emulate root. Used for "rootless containers".
15+
16+
<!--more-->
17+
18+
User namespaces are a Linux kernel feature that allows a non-root user to
19+
emulate superuser ("root") privileges,
20+
for example in order to run containers without being a superuser outside the container.
21+
22+
User namespace is effective for mitigating damage of potential container break-out attacks.
23+
24+
In the context of user namespaces, the namespace is a Linux kernel feature, and not a
25+
{{< glossary_tooltip text="namespace" term_id="namespace" >}} in the Kubernetes sense
26+
of the term.
27+
28+
<!-- TODO: https://kinvolk.io/blog/2020/12/improving-kubernetes-and-container-security-with-user-namespaces/ -->
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
title: Running Kubernetes Node Components as a Non-root User
3+
content_type: task
4+
min-kubernetes-server-version: 1.22
5+
---
6+
7+
<!-- overview -->
8+
9+
{{< feature-state for_k8s_version="v1.22" state="alpha" >}}
10+
11+
This document describes how to run Kubernetes Node components such as kubelet, CRI, OCI, and CNI
12+
without root privileges, by using a {{< glossary_tooltip text="user namespace" term_id="userns" >}}.
13+
14+
This technique is also known as _rootless mode_.
15+
16+
{{< note >}}
17+
This document describes how to run Kubernetes Node components (and hence pods) a non-root user.
18+
19+
If you are just looking for how to run a pod as a non-root user, see [SecurityContext](/docs/tasks/configure-pod-container/security-context/).
20+
{{< /note >}}
21+
22+
## {{% heading "prerequisites" %}}
23+
24+
{{% version-check %}}
25+
26+
* [Enable Cgroup v2](https://rootlesscontaine.rs/getting-started/common/cgroup2/)
27+
* [Enable systemd with user session](https://rootlesscontaine.rs/getting-started/common/login/)
28+
* [Configure several sysctl values, depending on host Linux distribution](https://rootlesscontaine.rs/getting-started/common/sysctl/)
29+
* [Ensure that your unprivileged user is listed in `/etc/subuid` and `/etc/subgid`](https://rootlesscontaine.rs/getting-started/common/subuid/)
30+
31+
* `KubeletInUserNamespace` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/)
32+
33+
<!-- steps -->
34+
35+
## Running Kubernetes inside Rootless Docker/Podman
36+
37+
[kind](https://kind.sigs.k8s.io/) supports running Kubernetes inside a Rootless Docker or Rootless Podman.
38+
39+
See [Running kind with Rootless Docker](https://kind.sigs.k8s.io/docs/user/rootless/).
40+
41+
<!--
42+
[minikube](https://minikube.sigs.k8s.io/docs/) also plans to support Rootless Docker/Podman drivers.
43+
See [minikube issue #10836](https://github.com/kubernetes/minikube/issues/10836) to track the progress.
44+
-->
45+
46+
## Running Rootless Kubernetes directly on a host
47+
48+
{{% thirdparty-content %}}
49+
50+
### K3s
51+
52+
[K3s](https://k3s.io/) experimentally supports rootless mode.
53+
54+
See [Running K3s with Rootless mode](https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootless-mode-experimental) for the usage.
55+
56+
### Usernetes
57+
[Usernetes](https://github.com/rootless-containers/usernetes) is a reference distribution of Kubernetes that can be installed under `$HOME` directory without the root privilege.
58+
59+
Usernetes supports both containerd and CRI-O as CRI runtimes.
60+
Usernetes supports multi-node clusters using Flannel (VXLAN).
61+
62+
See [the Usernetes repo](https://github.com/rootless-containers/usernetes) for the usage.
63+
64+
## Manually deploy a node that runs the kubelet in a user namespace {#userns-the-hard-way}
65+
66+
This section provides hints for running Kubernetes in a user namespace manually.
67+
68+
{{< note >}}
69+
This section is intended to be read by developers of Kubernetes distributions, not by end users.
70+
{{< /note >}}
71+
72+
### Creating a user namespace
73+
74+
The first step is to create a {{< glossary_tooltip text="user namespace" term_id="userns" >}}.
75+
76+
If you are trying to run Kubernetes in a user-namespaced container such as
77+
Rootless Docker/Podman or LXC/LXD, you are all set, and you can go to the next subsection.
78+
79+
Otherwise you have to create a user namespace by yourself, by calling `unshare(2)` with `CLONE_NEWUSER`.
80+
81+
A user namespace can be also unshared by using command line tools such as:
82+
- [RootlessKit](https://github.com/rootless-containers/rootlesskit)
83+
- [become-root](https://github.com/giuseppe/become-root)
84+
- [`unshare(1)`](https://man7.org/linux/man-pages/man1/unshare.1.html)
85+
86+
After unsharing the user namespace, you will also have to unshare other namespaces such as mount namespace.
87+
88+
You do *not* need to call `chroot()` nor `pivot_root()` after unsharing the mount namespace,
89+
however, you have to mount writable filesystems on several directories *in* the namespace.
90+
91+
At least, the following directories need to be writable *in* the namespace (not *outside* the namespace):
92+
93+
- `/etc`
94+
- `/run`
95+
- `/var/logs`
96+
- `/var/lib/kubelet`
97+
- `/var/lib/cni`
98+
- `/var/lib/containerd` (for containerd)
99+
- `/var/lib/containers` (for CRI-O)
100+
101+
### Creating a delegated cgroup tree
102+
103+
In addition to the user namespace, you also need to have a writable cgroup tree with cgroup v2.
104+
105+
{{< note >}}
106+
Kubernetes support for running Node components in user namespaces requires cgroup v2.
107+
Cgroup v1 is not supported.
108+
{{< /note >}}
109+
110+
If you are trying to run Kubernetes in Rootless Docker/Podman or LXC/LXD on a systemd-based host, you are all set.
111+
112+
Otherwise you have to create a systemd unit with `Delegate=yes` property to delegate a cgroup tree with writable permission.
113+
114+
On your node, systemd must already be configured to allow delegation; for more details, see
115+
[cgroup v2](https://rootlesscontaine.rs/getting-started/common/cgroup2/) in the Rootless
116+
Containers documentation.
117+
118+
### Configuring network
119+
{{% thirdparty-content %}}
120+
121+
The network namespace of the Node components has to have a non-loopback interface, which can be for example configured with
122+
slirp4netns, VPNKit, or lxc-user-nic.
123+
124+
The network namespaces of the Pods can be configured with regular CNI plugins.
125+
For multi-node networking, Flannel (VXLAN, 8472/UDP) is known to work.
126+
127+
Ports such as the kubelet port (10250/TCP) and `NodePort` service ports have to be exposed from the Node network namespace to
128+
the host with an external port forwarder, such as RootlessKit, slirp4netns, or socat.
129+
130+
You can use the port forwarder from K3s; see https://github.com/k3s-io/k3s/blob/v1.21.2+k3s1/pkg/rootlessports/controller.go
131+
132+
### Configuring CRI
133+
134+
The kubelet relies on a container runtime. You should deploy a container runtime such as containerd or CRI-O and ensure that it is running within the user namespace before the kubelet starts.
135+
136+
{{< tabs name="cri" >}}
137+
{{% tab name="containerd" %}}
138+
139+
Running CRI plugin of containerd in a user namespace is supported since containerd 1.4.
140+
141+
Running containerd within a user namespace requires the following configuration:
142+
143+
```toml
144+
version = 2
145+
146+
[plugins."io.containerd.grpc.v1.cri"]
147+
# Disable AppArmor
148+
disable_apparmor = true
149+
# Ignore an error during setting oom_score_adj
150+
restrict_oom_score_adj = true
151+
# Disable hugetlb cgroup v2 controller (because systemd does not support delegating hugetlb controller)
152+
disable_hugetlb_controller = true
153+
154+
[plugins."io.containerd.grpc.v1.cri".containerd]
155+
# Using non-fuse overlayfs is also possible for kernel >= 5.11, but requires SELinux to be disabled
156+
snapshotter = "fuse-overlayfs"
157+
158+
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
159+
# We use cgroupfs that is delegated by systemd, so we do not use SystemdCgroup driver
160+
# (unless you run another systemd in the namespace)
161+
SystemdCgroup = false
162+
```
163+
164+
{{% /tab %}}
165+
{{% tab name="CRI-O" %}}
166+
167+
Running CRI-O in a user namespace is supported since CRI-O 1.22.
168+
169+
CRI-O requires an environment variable `_CRIO_ROOTLESS=1` to be set.
170+
171+
The following configuration is also recommended:
172+
173+
```toml
174+
[crio]
175+
storage_driver = "overlay"
176+
# Using non-fuse overlayfs is also possible for kernel >= 5.11, but requires SELinux to be disabled
177+
storage_option = ["overlay.mount_program=/usr/local/bin/fuse-overlayfs"]
178+
179+
[crio.runtime]
180+
# We use cgroupfs that is delegated by systemd, so we do not use "systemd" driver
181+
# (unless you run another systemd in the namespace)
182+
cgroup_manager = "cgroupfs"
183+
```
184+
185+
{{% /tab %}}
186+
{{< /tabs >}}
187+
188+
### Configuring kubelet
189+
190+
Running kubelet in a user namespace requires the following configuration:
191+
192+
```yaml
193+
kind: KubeletConfiguration
194+
apiVersion: kubelet.config.k8s.io/v1beta1
195+
featureGates:
196+
KubeletInUserNamespace: true
197+
# We use cgroupfs that is delegated by systemd, so we do not use "systemd" driver
198+
# (unless you run another systemd in the namespace)
199+
cgroupDriver: "cgroupfs"
200+
```
201+
202+
When the `KubeletInUserNamespace` feature gate is enabled, kubelet ignores errors that may happen during setting the following sysctl values:
203+
- `vm.overcommit_memory`
204+
- `vm.panic_on_oom`
205+
- `kernel.panic`
206+
- `kernel.panic_on_oops`
207+
- `kernel.keys.root_maxkeys`
208+
- `kernel.keys.root_maxbytes`.
209+
(these are sysctl values for the host, not for the containers).
210+
211+
Within a user namespace, the kubelet also ignores any error raised from trying to open `/dev/kmsg`.
212+
This feature gate also allows kube-proxy to ignore an error during setting `RLIMIT_NOFILE`.
213+
214+
The `KubeletInUserNamespace` feature gate was introduced in Kubernetes v1.22 with "alpha" status.
215+
216+
Running kubelet in a user namespace without using this feature gate is also possible by mounting a specially crafted proc filesystem,
217+
but not officially supported.
218+
219+
### Configuring kube-proxy
220+
221+
Running kube-proxy in a user namespace requires the following configuration:
222+
223+
```yaml
224+
apiVersion: kubeproxy.config.k8s.io/v1alpha1
225+
kind: KubeProxyConfiguration
226+
mode: "iptables" # or "userspace"
227+
conntrack:
228+
# Skip setting sysctl value "net.netfilter.nf_conntrack_max"
229+
maxPerCore: 0
230+
# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_established"
231+
tcpEstablishedTimeout: 0s
232+
# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_close"
233+
tcpCloseWaitTimeout: 0s
234+
```
235+
236+
## Caveats
237+
238+
- Most of "non-local" volume drivers such as `nfs` and `iscsi` do not work.
239+
Local volumes like `local`, `hostPath`, `emptyDir`, `configMap`, `secret`, and `downwardAPI` are known to work.
240+
241+
- Some CNI plugins may not work. Flannel (VXLAN) is known to work.
242+
243+
For more on this, see the [Caveats and Future work](https://rootlesscontaine.rs/caveats/) page
244+
on the rootlesscontaine.rs website.
245+
246+
## {{% heading "seealso" %}}
247+
- [rootlesscontaine.rs](https://rootlesscontaine.rs/)
248+
- [Rootless Containers 2020 (KubeCon NA 2020)](https://www.slideshare.net/AkihiroSuda/kubecon-na-2020-containerd-rootless-containers-2020)
249+
- [Running kind with Rootless Docker](https://kind.sigs.k8s.io/docs/user/rootless/)
250+
- [Usernetes](https://github.com/rootless-containers/usernetes)
251+
- [Running K3s with rootless mode](https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootless-mode-experimental)
252+
- [KEP-2033: Kubelet-in-UserNS (aka Rootless mode)](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless)

0 commit comments

Comments
 (0)