Rootless mode allows running BuildKit daemon as a non-root user.
- Using the
overlayfssnapshotter requires kernel >= 5.11 or Ubuntu kernel. On kernel >= 4.18, thefuse-overlayfssnapshotter is used instead ofoverlayfs. On kernel < 4.18, thenativesnapshotter is used. - Network mode is always set to
network.host.
RootlessKit needs to be installed.
rootlesskit buildkitdbuildctl --addr unix:///run/user/$UID/buildkit/buildkitd.sock build ...Tip
To isolate BuildKit daemon's network namespace from the host (recommended):
rootlesskit --net=slirp4netns --copy-up=/etc --disable-host-loopback buildkitdRootlessKit needs to be installed.
Run containerd in rootless mode using rootlesskit following containerd's document.
containerd-rootless.sh
CONTAINERD_NAMESPACE=default containerd-rootless-setuptool.sh install-buildkit-containerdAdvanced guide
Alternatively, you can specify the full command line flags as follows:
containerd-rootless.sh --config /path/to/config.toml
containerd-rootless-setuptool.sh nsenter -- buildkitd --oci-worker=false --containerd-worker=truedocker run \
--name buildkitd \
-d \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--security-opt systempaths=unconfined \
moby/buildkit:rootless
buildctl --addr docker-container://buildkitd build ...Tip
If you don't mind using --privileged (almost safe for rootless), the docker run flags can be shorten as follows:
docker run --name buildkitd -d --privileged moby/buildkit:rootlessJustification of the --security-opt flags:
-
seccomp=unconfined: For allowing several syscalls such asunshare(used by runc) andmount(used by snapshotters, etc). -
apparmor=unconfined: For allowing mounting filesystems, etc. This flag is not needed when the host operating system does not use AppArmor. -
systempaths=unconfined: For disabling the masks for the/procmount in the container, so that each ofExecOp(corresponds to aRUNinstruction in Dockerfile) can have a dedicated/procfilesystem.systempaths=unconfinedpotentially allows reading and writing dangerous kernel files from a container, but it is safe when you are runningbuildkitdas non-root.
Tip
Instead of --security-opt systempaths=unconfined, buildkitd can be also executed with --oci-worker-no-process-sandbox (flag of buildkitd, not docker)
to avoid creating a new PID namespace and mounting a new /proc for it.
Using --oci-worker-no-process-sandbox is discouraged, as it cannot terminate processes that did not exit during an ExecOp.
Also, --oci-worker-no-process-sandbox allows ExecOp containers to kill (and potentially ptrace depending on the seccomp configuration) an arbitrary process in the BuildKit daemon container.
Despite these caveats, the Kubernetes examples uses --oci-worker-no-process-sandbox, as Kubernetes lacks the equivalent of systempaths=unconfined.
(securityContext.procMount=Unmasked is similar, but different in the sense that it depends on hostUsers: false)
The moby/buildkit:rootless image has the following UID/GID configuration:
| Actual ID (shown in the host and the BuildKit daemon container) | Mapped ID (shown in build executor containers) |
|---|---|
| 1000 | 0 |
| 100000 | 1 |
| ... | ... |
| 165535 | 65536 |
$ docker exec buildkitd id
uid=1000(user) gid=1000(user)
$ docker exec buildkitd ps aux
PID USER TIME COMMAND
1 user 0:00 rootlesskit buildkitd --addr tcp://0.0.0.0:1234
13 user 0:00 /proc/self/exe buildkitd --addr tcp://0.0.0.0:1234
21 user 0:00 buildkitd --addr tcp://0.0.0.0:1234
29 user 0:00 ps aux
$ docker exec cat /etc/subuid
user:100000:65536To change the UID/GID configuration, you need to modify and build the BuildKit image manually.
vi Dockerfile
make images
docker run ... moby/buildkit:local-rootless ...Try running buildkitd with --oci-worker-snapshotter=fuse-overlayfs:
$ rootlesskit buildkitd --oci-worker-snapshotter=fuse-overlayfsRun docker run with --device /dev/fuse.
Also try running buildkitd with --oci-worker-snapshotter=native:
$ rootlesskit buildkitd --oci-worker-snapshotter=nativeSee https://rootlesscontaine.rs/getting-started/common/subuid/
Make sure to mount an emptyDir volume on /home/user/.local/share/buildkit .
Error fork/exec /proc/self/exe: no space left on device with level=warning msg="/proc/sys/user/max_user_namespaces needs to be set to non-zero."
Run sysctl -w user.max_user_namespaces=N (N=positive integer, like 63359) on the host nodes.
See ../examples/kubernetes/sysctl-userns.privileged.yaml.
Error fork/exec /proc/self/exe: permission denied with This error might have happened because /proc/sys/kernel/apparmor_restrict_unprivileged_userns is set to 1
Add kernel.apparmor_restrict_unprivileged_userns=0 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl -p.
This error is known to happen when BuildKit is executed in a container without the --security-opt systempaths=unconfined flag.
Make sure to specify it (See above).
Using Ubuntu kernel is recommended.
Add kernel.apparmor_restrict_unprivileged_userns=0 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl -p.
Make sure to have an emptyDir volume below:
spec:
containers:
- name: buildkitd
volumeMounts:
# Dockerfile has `VOLUME /home/user/.local/share/buildkit` by default too,
# but the default VOLUME does not work with rootless on Google's Container-Optimized OS
# as it is mounted with `nosuid,nodev`.
# https://github.com/moby/buildkit/issues/879#issuecomment-1240347038
- mountPath: /home/user/.local/share/buildkit
name: buildkitd
volumes:
- name: buildkitd
emptyDir: {}See also the example manifests.
Needs to set the max user namespaces to a positive integer, through the API settings:
[settings.kernel.sysctl]
"user.max_user_namespaces" = "16384"See ../examples/eksctl/bottlerocket.yaml for an example to configure a Node Group in EKS.
Old distributions
Add kernel.unprivileged_userns_clone=1 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl -p.
This step is not needed for Debian GNU/Linux 11 and later.
Add user.max_user_namespaces=28633 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl -p.
This step is not needed for RHEL/CentOS 8 and later.
You may have to disable SELinux, or run BuildKit with --oci-worker-snapshotter=fuse-overlayfs.