|
| 1 | +--- |
| 2 | +title: Running Kubernetes Node Components as a Non-root User |
| 3 | +content_type: task |
| 4 | +min-kubernetes-server-version: 1.22 |
| 5 | +--- |
| 6 | + |
| 7 | +<!-- overview --> |
| 8 | + |
| 9 | +{{< feature-state for_k8s_version="v1.22" state="alpha" >}} |
| 10 | + |
| 11 | +This document describes how to run Kubernetes Node components such as kubelet, CRI, OCI, and CNI |
| 12 | +without root privileges, by using a {{< glossary_tooltip text="user namespace" term_id="userns" >}}. |
| 13 | + |
| 14 | +This technique is also known as _rootless mode_. |
| 15 | + |
| 16 | +{{< note >}} |
| 17 | +This document describes how to run Kubernetes Node components (and hence pods) a non-root user. |
| 18 | + |
| 19 | +If you are just looking for how to run a pod as a non-root user, see [SecurityContext](/docs/tasks/configure-pod-container/security-context/). |
| 20 | +{{< /note >}} |
| 21 | + |
| 22 | +## {{% heading "prerequisites" %}} |
| 23 | + |
| 24 | +{{% version-check %}} |
| 25 | + |
| 26 | +* [Enable Cgroup v2](https://rootlesscontaine.rs/getting-started/common/cgroup2/) |
| 27 | +* [Enable systemd with user session](https://rootlesscontaine.rs/getting-started/common/login/) |
| 28 | +* [Configure several sysctl values, depending on host Linux distribution](https://rootlesscontaine.rs/getting-started/common/sysctl/) |
| 29 | +* [Ensure that your unprivileged user is listed in `/etc/subuid` and `/etc/subgid`](https://rootlesscontaine.rs/getting-started/common/subuid/) |
| 30 | + |
| 31 | +* `KubeletInUserNamespace` [feature gate](/docs/reference/command-line-tools-reference/feature-gates/) |
| 32 | + |
| 33 | +<!-- steps --> |
| 34 | + |
| 35 | +## Running Kubernetes inside Rootless Docker/Podman |
| 36 | + |
| 37 | +[kind](https://kind.sigs.k8s.io/) supports running Kubernetes inside a Rootless Docker or Rootless Podman. |
| 38 | + |
| 39 | +See [Running kind with Rootless Docker](https://kind.sigs.k8s.io/docs/user/rootless/). |
| 40 | + |
| 41 | +<!-- |
| 42 | +[minikube](https://minikube.sigs.k8s.io/docs/) also plans to support Rootless Docker/Podman drivers. |
| 43 | +See [minikube issue #10836](https://github.com/kubernetes/minikube/issues/10836) to track the progress. |
| 44 | +--> |
| 45 | + |
| 46 | +## Running Rootless Kubernetes directly on a host |
| 47 | + |
| 48 | +{{% thirdparty-content %}} |
| 49 | + |
| 50 | +### K3s |
| 51 | + |
| 52 | +[K3s](https://k3s.io/) experimentally supports rootless mode. |
| 53 | + |
| 54 | +See [Running K3s with Rootless mode](https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootless-mode-experimental) for the usage. |
| 55 | + |
| 56 | +### Usernetes |
| 57 | +[Usernetes](https://github.com/rootless-containers/usernetes) is a reference distribution of Kubernetes that can be installed under `$HOME` directory without the root privilege. |
| 58 | + |
| 59 | +Usernetes supports both containerd and CRI-O as CRI runtimes. |
| 60 | +Usernetes supports multi-node clusters using Flannel (VXLAN). |
| 61 | + |
| 62 | +See [the Usernetes repo](https://github.com/rootless-containers/usernetes) for the usage. |
| 63 | + |
| 64 | +## Manually deploy a node that runs the kubelet in a user namespace {#userns-the-hard-way} |
| 65 | + |
| 66 | +This section provides hints for running Kubernetes in a user namespace manually. |
| 67 | + |
| 68 | +{{< note >}} |
| 69 | +This section is intended to be read by developers of Kubernetes distributions, not by end users. |
| 70 | +{{< /note >}} |
| 71 | + |
| 72 | +### Creating a user namespace |
| 73 | + |
| 74 | +The first step is to create a {{< glossary_tooltip text="user namespace" term_id="userns" >}}. |
| 75 | + |
| 76 | +If you are trying to run Kubernetes in a user-namespaced container such as |
| 77 | +Rootless Docker/Podman or LXC/LXD, you are all set, and you can go to the next subsection. |
| 78 | + |
| 79 | +Otherwise you have to create a user namespace by yourself, by calling `unshare(2)` with `CLONE_NEWUSER`. |
| 80 | + |
| 81 | +A user namespace can be also unshared by using command line tools such as: |
| 82 | +- [RootlessKit](https://github.com/rootless-containers/rootlesskit) |
| 83 | +- [become-root](https://github.com/giuseppe/become-root) |
| 84 | +- [`unshare(1)`](https://man7.org/linux/man-pages/man1/unshare.1.html) |
| 85 | + |
| 86 | +After unsharing the user namespace, you will also have to unshare other namespaces such as mount namespace. |
| 87 | + |
| 88 | +You do *not* need to call `chroot()` nor `pivot_root()` after unsharing the mount namespace, |
| 89 | +however, you have to mount writable filesystems on several directories *in* the namespace. |
| 90 | + |
| 91 | +At least, the following directories need to be writable *in* the namespace (not *outside* the namespace): |
| 92 | + |
| 93 | +- `/etc` |
| 94 | +- `/run` |
| 95 | +- `/var/logs` |
| 96 | +- `/var/lib/kubelet` |
| 97 | +- `/var/lib/cni` |
| 98 | +- `/var/lib/containerd` (for containerd) |
| 99 | +- `/var/lib/containers` (for CRI-O) |
| 100 | + |
| 101 | +### Creating a delegated cgroup tree |
| 102 | + |
| 103 | +In addition to the user namespace, you also need to have a writable cgroup tree with cgroup v2. |
| 104 | + |
| 105 | +{{< note >}} |
| 106 | +Kubernetes support for running Node components in user namespaces requires cgroup v2. |
| 107 | +Cgroup v1 is not supported. |
| 108 | +{{< /note >}} |
| 109 | + |
| 110 | +If you are trying to run Kubernetes in Rootless Docker/Podman or LXC/LXD on a systemd-based host, you are all set. |
| 111 | + |
| 112 | +Otherwise you have to create a systemd unit with `Delegate=yes` property to delegate a cgroup tree with writable permission. |
| 113 | + |
| 114 | +On your node, systemd must already be configured to allow delegation; for more details, see |
| 115 | +[cgroup v2](https://rootlesscontaine.rs/getting-started/common/cgroup2/) in the Rootless |
| 116 | +Containers documentation. |
| 117 | + |
| 118 | +### Configuring network |
| 119 | +{{% thirdparty-content %}} |
| 120 | + |
| 121 | +The network namespace of the Node components has to have a non-loopback interface, which can be for example configured with |
| 122 | +slirp4netns, VPNKit, or lxc-user-nic. |
| 123 | + |
| 124 | +The network namespaces of the Pods can be configured with regular CNI plugins. |
| 125 | +For multi-node networking, Flannel (VXLAN, 8472/UDP) is known to work. |
| 126 | + |
| 127 | +Ports such as the kubelet port (10250/TCP) and `NodePort` service ports have to be exposed from the Node network namespace to |
| 128 | +the host with an external port forwarder, such as RootlessKit, slirp4netns, or socat. |
| 129 | + |
| 130 | +You can use the port forwarder from K3s; see https://github.com/k3s-io/k3s/blob/v1.21.2+k3s1/pkg/rootlessports/controller.go |
| 131 | + |
| 132 | +### Configuring CRI |
| 133 | + |
| 134 | +The kubelet relies on a container runtime. You should deploy a container runtime such as containerd or CRI-O and ensure that it is running within the user namespace before the kubelet starts. |
| 135 | + |
| 136 | +{{< tabs name="cri" >}} |
| 137 | +{{% tab name="containerd" %}} |
| 138 | + |
| 139 | +Running CRI plugin of containerd in a user namespace is supported since containerd 1.4. |
| 140 | + |
| 141 | +Running containerd within a user namespace requires the following configuration: |
| 142 | + |
| 143 | +```toml |
| 144 | +version = 2 |
| 145 | + |
| 146 | +[plugins."io.containerd.grpc.v1.cri"] |
| 147 | +# Disable AppArmor |
| 148 | + disable_apparmor = true |
| 149 | +# Ignore an error during setting oom_score_adj |
| 150 | + restrict_oom_score_adj = true |
| 151 | +# Disable hugetlb cgroup v2 controller (because systemd does not support delegating hugetlb controller) |
| 152 | + disable_hugetlb_controller = true |
| 153 | + |
| 154 | +[plugins."io.containerd.grpc.v1.cri".containerd] |
| 155 | +# Using non-fuse overlayfs is also possible for kernel >= 5.11, but requires SELinux to be disabled |
| 156 | + snapshotter = "fuse-overlayfs" |
| 157 | + |
| 158 | +[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] |
| 159 | +# We use cgroupfs that is delegated by systemd, so we do not use SystemdCgroup driver |
| 160 | +# (unless you run another systemd in the namespace) |
| 161 | + SystemdCgroup = false |
| 162 | +``` |
| 163 | + |
| 164 | +{{% /tab %}} |
| 165 | +{{% tab name="CRI-O" %}} |
| 166 | + |
| 167 | +Running CRI-O in a user namespace is supported since CRI-O 1.22. |
| 168 | + |
| 169 | +CRI-O requires an environment variable `_CRIO_ROOTLESS=1` to be set. |
| 170 | + |
| 171 | +The following configuration is also recommended: |
| 172 | + |
| 173 | +```toml |
| 174 | +[crio] |
| 175 | + storage_driver = "overlay" |
| 176 | +# Using non-fuse overlayfs is also possible for kernel >= 5.11, but requires SELinux to be disabled |
| 177 | + storage_option = ["overlay.mount_program=/usr/local/bin/fuse-overlayfs"] |
| 178 | + |
| 179 | +[crio.runtime] |
| 180 | +# We use cgroupfs that is delegated by systemd, so we do not use "systemd" driver |
| 181 | +# (unless you run another systemd in the namespace) |
| 182 | + cgroup_manager = "cgroupfs" |
| 183 | +``` |
| 184 | + |
| 185 | +{{% /tab %}} |
| 186 | +{{< /tabs >}} |
| 187 | + |
| 188 | +### Configuring kubelet |
| 189 | + |
| 190 | +Running kubelet in a user namespace requires the following configuration: |
| 191 | + |
| 192 | +```yaml |
| 193 | +kind: KubeletConfiguration |
| 194 | +apiVersion: kubelet.config.k8s.io/v1beta1 |
| 195 | +featureGates: |
| 196 | + KubeletInUserNamespace: true |
| 197 | +# We use cgroupfs that is delegated by systemd, so we do not use "systemd" driver |
| 198 | +# (unless you run another systemd in the namespace) |
| 199 | +cgroupDriver: "cgroupfs" |
| 200 | +``` |
| 201 | +
|
| 202 | +When the `KubeletInUserNamespace` feature gate is enabled, kubelet ignores errors that may happen during setting the following sysctl values: |
| 203 | +- `vm.overcommit_memory` |
| 204 | +- `vm.panic_on_oom` |
| 205 | +- `kernel.panic` |
| 206 | +- `kernel.panic_on_oops` |
| 207 | +- `kernel.keys.root_maxkeys` |
| 208 | +- `kernel.keys.root_maxbytes`. |
| 209 | + (these are sysctl values for the host, not for the containers). |
| 210 | + |
| 211 | +Within a user namespace, the kubelet also ignores any error raised from trying to open `/dev/kmsg`. |
| 212 | +This feature gate also allows kube-proxy to ignore an error during setting `RLIMIT_NOFILE`. |
| 213 | + |
| 214 | +The `KubeletInUserNamespace` feature gate was introduced in Kubernetes v1.22 with "alpha" status. |
| 215 | + |
| 216 | +Running kubelet in a user namespace without using this feature gate is also possible by mounting a specially crafted proc filesystem, |
| 217 | +but not officially supported. |
| 218 | + |
| 219 | +### Configuring kube-proxy |
| 220 | + |
| 221 | +Running kube-proxy in a user namespace requires the following configuration: |
| 222 | + |
| 223 | +```yaml |
| 224 | +apiVersion: kubeproxy.config.k8s.io/v1alpha1 |
| 225 | +kind: KubeProxyConfiguration |
| 226 | +mode: "iptables" # or "userspace" |
| 227 | +conntrack: |
| 228 | +# Skip setting sysctl value "net.netfilter.nf_conntrack_max" |
| 229 | + maxPerCore: 0 |
| 230 | +# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_established" |
| 231 | + tcpEstablishedTimeout: 0s |
| 232 | +# Skip setting "net.netfilter.nf_conntrack_tcp_timeout_close" |
| 233 | + tcpCloseWaitTimeout: 0s |
| 234 | +``` |
| 235 | + |
| 236 | +## Caveats |
| 237 | + |
| 238 | +- Most of "non-local" volume drivers such as `nfs` and `iscsi` do not work. |
| 239 | + Local volumes like `local`, `hostPath`, `emptyDir`, `configMap`, `secret`, and `downwardAPI` are known to work. |
| 240 | + |
| 241 | +- Some CNI plugins may not work. Flannel (VXLAN) is known to work. |
| 242 | + |
| 243 | +For more on this, see the [Caveats and Future work](https://rootlesscontaine.rs/caveats/) page |
| 244 | +on the rootlesscontaine.rs website. |
| 245 | + |
| 246 | +## {{% heading "seealso" %}} |
| 247 | +- [rootlesscontaine.rs](https://rootlesscontaine.rs/) |
| 248 | +- [Rootless Containers 2020 (KubeCon NA 2020)](https://www.slideshare.net/AkihiroSuda/kubecon-na-2020-containerd-rootless-containers-2020) |
| 249 | +- [Running kind with Rootless Docker](https://kind.sigs.k8s.io/docs/user/rootless/) |
| 250 | +- [Usernetes](https://github.com/rootless-containers/usernetes) |
| 251 | +- [Running K3s with rootless mode](https://rancher.com/docs/k3s/latest/en/advanced/#running-k3s-with-rootless-mode-experimental) |
| 252 | +- [KEP-2033: Kubelet-in-UserNS (aka Rootless mode)](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless) |
0 commit comments