|
| 1 | +--- |
| 2 | +layout: blog |
| 3 | +title: "Kubernetes 1.27: Efficient SELinux volume relabeling (Beta)" |
| 4 | +date: 2023-04-18T10:00:00-08:00 |
| 5 | +slug: kubernetes-1-27-efficient-selinux-relabeling-beta |
| 6 | +--- |
| 7 | + |
| 8 | +**Author:** Jan Šafránek (Red Hat) |
| 9 | + |
| 10 | +# The problem |
| 11 | + |
| 12 | +On Linux with Security-Enhanced Linux (SELinux) enabled, it's traditionally |
| 13 | +the container runtime that applies SELinux labels to a Pod and all its volumes. |
| 14 | +Kubernetes only passes the SELinux label from a Pod's `securityContext` fields |
| 15 | +to the container runtime. |
| 16 | + |
| 17 | +The container runtime then recursively changes SELinux label on all files that |
| 18 | +are visible to the Pod's containers. This can be time-consuming if there are |
| 19 | +many files on the volume, especially when the volume is on a remote filesystem. |
| 20 | + |
| 21 | +{{% alert title="Note" color="info" %}} |
| 22 | +If a container uses `subPath` of a volume, only that `subPath` of the whole |
| 23 | +volume is relabeled. This allows two pods that have two different SELinux labels |
| 24 | +to use the same volume, as long as they use different subpaths of it. |
| 25 | +{{% /alert %}} |
| 26 | + |
| 27 | +If a Pod does not have any SELinux label assigned in Kubernetes API, the |
| 28 | +container runtime assigns a unique random one, so a process that potentially |
| 29 | +escapes the container boundary cannot access data of any other container on the |
| 30 | +host. The container runtime still recursively relabels all pod volumes with this |
| 31 | +random SELinux label. |
| 32 | + |
| 33 | +# Improvement using mount options |
| 34 | + |
| 35 | +If a Pod and its volume meet **all** of the following conditions, Kubernetes will |
| 36 | +_mount_ the volume directly with the right SELinux label. Such mount will happen |
| 37 | +in a constant time and the container runtime will not need to recursively |
| 38 | +relabel any files on it. |
| 39 | + |
| 40 | +1. The operating system must support SELinux. |
| 41 | + |
| 42 | + Without SELinux support detected, kubelet and the container runtime do not |
| 43 | + do anything with regard to SELinux. |
| 44 | + |
| 45 | +1. The [feature gates](/docs/reference/command-line-tools-reference/feature-gates/) |
| 46 | + `ReadWriteOncePod` and `SELinuxMountReadWriteOncePod` must be enabled. |
| 47 | + These feature gates are Beta in Kubernetes 1.27 and Alpha in 1.25. |
| 48 | + |
| 49 | + With any of these feature gates disabled, SELinux labels will be always |
| 50 | + applied by the container runtime by a recursive walk through the volume |
| 51 | + (or its subPaths). |
| 52 | + |
| 53 | +1. The Pod must have at least `seLinuxOptions.level` assigned in its [Pod Security Context](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) or all Pod containers must have it set in their [Security Contexts](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1). |
| 54 | + Kubernetes will read the default `user`, `role` and `type` from the operating |
| 55 | + system defaults (typically `system_u`, `system_r` and `container_t`). |
| 56 | + |
| 57 | + Without Kubernetes knowing at least the SELinux `level`, the container |
| 58 | + runtime will assign a random one _after_ the volumes are mounted. The |
| 59 | + container runtime will still relabel the volumes recursively in that case. |
| 60 | + |
| 61 | +1. The volume must be a Persistent Volume with |
| 62 | + [Access Mode](/docs/concepts/storage/persistent-volumes/#access-modes) |
| 63 | + `ReadWriteOncePod`. |
| 64 | + |
| 65 | + This is a limitation of the initial implementation. As described above, |
| 66 | + two Pods can have a different SELinux label and still use the same volume, |
| 67 | + as long as they use a different `subPath` of it. This use case is not |
| 68 | + possible when the volumes are _mounted_ with the SELinux label, because the |
| 69 | + whole volume is mounted and most filesystems don't support mounting a single |
| 70 | + volume multiple times with multiple SELinux labels. |
| 71 | + |
| 72 | + If running two Pods with two different SELinux contexts and using |
| 73 | + different `subPaths` of the same volume is necessary in your deployments, |
| 74 | + please comment in the [KEP](https://github.com/kubernetes/enhancements/issues/1710) |
| 75 | + issue (or upvote any existing comment - it's best not to duplicate). |
| 76 | + Such pods may not run when the feature is extended to cover all volume access modes. |
| 77 | + |
| 78 | +1. The volume plugin or the CSI driver responsible for the volume supports |
| 79 | + mounting with SELinux mount options. |
| 80 | + |
| 81 | + These in-tree volume plugins support mounting with SELinux mount options: |
| 82 | + `fc`, `iscsi`, and `rbd`. |
| 83 | + |
| 84 | + CSI drivers that support mounting with SELinux mount options must announce |
| 85 | + that in their |
| 86 | + [CSIDriver](/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/) |
| 87 | + instance by setting `seLinuxMount` field. |
| 88 | + |
| 89 | + Volumes managed by other volume plugins or CSI drivers that don't |
| 90 | + set `seLinuxMount: true` will be recursively relabelled by the container |
| 91 | + runtime. |
| 92 | + |
| 93 | +## Mounting with SELinux context |
| 94 | + |
| 95 | +When all aforementioned conditions are met, kubelet will |
| 96 | +pass `-o context=<SELinux label>` mount option to the volume plugin or CSI |
| 97 | +driver. CSI driver vendors must ensure that this mount option is supported |
| 98 | +by their CSI driver and, if necessary, the CSI driver appends other mount |
| 99 | +options that are needed for `-o context` to work. |
| 100 | + |
| 101 | +For example, NFS may need `-o context=<SELinux label>,nosharecache`, so each |
| 102 | +volume mounted from the same NFS server can have a different SELinux label |
| 103 | +value. Similarly, CIFS may need `-o context=<SELinux label>,nosharesock`. |
| 104 | + |
| 105 | +It's up to the CSI driver vendor to test their CSI driver in a SELinux enabled |
| 106 | +environment before setting `seLinuxMount: true` in the CSIDriver instance. |
| 107 | + |
| 108 | +# How can I learn more? |
| 109 | +SELinux in containers: see excellent |
| 110 | +[visual SELinux guide](https://opensource.com/business/13/11/selinux-policy-guide) |
| 111 | +by Daniel J Walsh. Note that the guide is older than Kubernetes, it describes |
| 112 | +*Multi-Category Security* (MCS) mode using virtual machines as an example, |
| 113 | +however, a similar concept is used for containers. |
| 114 | + |
| 115 | +See a series of blog posts for details how exactly SELinux is applied to |
| 116 | +containers by container runtimes: |
| 117 | +* [How SELinux separates containers using Multi-Level Security](https://www.redhat.com/en/blog/how-selinux-separates-containers-using-multi-level-security) |
| 118 | +* [Why you should be using Multi-Category Security for your Linux containers](https://www.redhat.com/en/blog/why-you-should-be-using-multi-category-security-your-linux-containers) |
| 119 | + |
| 120 | +Read the KEP: [Speed up SELinux volume relabeling using mounts](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling) |
0 commit comments