Skip to content

Commit 49be5fd

Browse files
authored
Merge pull request #39836 from jsafrane/add-selinux-blog
Add blog for Efficient SELinux volume relabeling (Beta)
2 parents 88c9abe + 357e5a6 commit 49be5fd

File tree

1 file changed

+120
-0
lines changed

1 file changed

+120
-0
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
layout: blog
3+
title: "Kubernetes 1.27: Efficient SELinux volume relabeling (Beta)"
4+
date: 2023-04-18T10:00:00-08:00
5+
slug: kubernetes-1-27-efficient-selinux-relabeling-beta
6+
---
7+
8+
**Author:** Jan Šafránek (Red Hat)
9+
10+
# The problem
11+
12+
On Linux with Security-Enhanced Linux (SELinux) enabled, it's traditionally
13+
the container runtime that applies SELinux labels to a Pod and all its volumes.
14+
Kubernetes only passes the SELinux label from a Pod's `securityContext` fields
15+
to the container runtime.
16+
17+
The container runtime then recursively changes SELinux label on all files that
18+
are visible to the Pod's containers. This can be time-consuming if there are
19+
many files on the volume, especially when the volume is on a remote filesystem.
20+
21+
{{% alert title="Note" color="info" %}}
22+
If a container uses `subPath` of a volume, only that `subPath` of the whole
23+
volume is relabeled. This allows two pods that have two different SELinux labels
24+
to use the same volume, as long as they use different subpaths of it.
25+
{{% /alert %}}
26+
27+
If a Pod does not have any SELinux label assigned in Kubernetes API, the
28+
container runtime assigns a unique random one, so a process that potentially
29+
escapes the container boundary cannot access data of any other container on the
30+
host. The container runtime still recursively relabels all pod volumes with this
31+
random SELinux label.
32+
33+
# Improvement using mount options
34+
35+
If a Pod and its volume meet **all** of the following conditions, Kubernetes will
36+
_mount_ the volume directly with the right SELinux label. Such mount will happen
37+
in a constant time and the container runtime will not need to recursively
38+
relabel any files on it.
39+
40+
1. The operating system must support SELinux.
41+
42+
Without SELinux support detected, kubelet and the container runtime do not
43+
do anything with regard to SELinux.
44+
45+
1. The [feature gates](/docs/reference/command-line-tools-reference/feature-gates/)
46+
`ReadWriteOncePod` and `SELinuxMountReadWriteOncePod` must be enabled.
47+
These feature gates are Beta in Kubernetes 1.27 and Alpha in 1.25.
48+
49+
With any of these feature gates disabled, SELinux labels will be always
50+
applied by the container runtime by a recursive walk through the volume
51+
(or its subPaths).
52+
53+
1. The Pod must have at least `seLinuxOptions.level` assigned in its [Pod Security Context](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context) or all Pod containers must have it set in their [Security Contexts](/docs/reference/kubernetes-api/workload-resources/pod-v1/#security-context-1).
54+
Kubernetes will read the default `user`, `role` and `type` from the operating
55+
system defaults (typically `system_u`, `system_r` and `container_t`).
56+
57+
Without Kubernetes knowing at least the SELinux `level`, the container
58+
runtime will assign a random one _after_ the volumes are mounted. The
59+
container runtime will still relabel the volumes recursively in that case.
60+
61+
1. The volume must be a Persistent Volume with
62+
[Access Mode](/docs/concepts/storage/persistent-volumes/#access-modes)
63+
`ReadWriteOncePod`.
64+
65+
This is a limitation of the initial implementation. As described above,
66+
two Pods can have a different SELinux label and still use the same volume,
67+
as long as they use a different `subPath` of it. This use case is not
68+
possible when the volumes are _mounted_ with the SELinux label, because the
69+
whole volume is mounted and most filesystems don't support mounting a single
70+
volume multiple times with multiple SELinux labels.
71+
72+
If running two Pods with two different SELinux contexts and using
73+
different `subPaths` of the same volume is necessary in your deployments,
74+
please comment in the [KEP](https://github.com/kubernetes/enhancements/issues/1710)
75+
issue (or upvote any existing comment - it's best not to duplicate).
76+
Such pods may not run when the feature is extended to cover all volume access modes.
77+
78+
1. The volume plugin or the CSI driver responsible for the volume supports
79+
mounting with SELinux mount options.
80+
81+
These in-tree volume plugins support mounting with SELinux mount options:
82+
`fc`, `iscsi`, and `rbd`.
83+
84+
CSI drivers that support mounting with SELinux mount options must announce
85+
that in their
86+
[CSIDriver](/docs/reference/kubernetes-api/config-and-storage-resources/csi-driver-v1/)
87+
instance by setting `seLinuxMount` field.
88+
89+
Volumes managed by other volume plugins or CSI drivers that don't
90+
set `seLinuxMount: true` will be recursively relabelled by the container
91+
runtime.
92+
93+
## Mounting with SELinux context
94+
95+
When all aforementioned conditions are met, kubelet will
96+
pass `-o context=<SELinux label>` mount option to the volume plugin or CSI
97+
driver. CSI driver vendors must ensure that this mount option is supported
98+
by their CSI driver and, if necessary, the CSI driver appends other mount
99+
options that are needed for `-o context` to work.
100+
101+
For example, NFS may need `-o context=<SELinux label>,nosharecache`, so each
102+
volume mounted from the same NFS server can have a different SELinux label
103+
value. Similarly, CIFS may need `-o context=<SELinux label>,nosharesock`.
104+
105+
It's up to the CSI driver vendor to test their CSI driver in a SELinux enabled
106+
environment before setting `seLinuxMount: true` in the CSIDriver instance.
107+
108+
# How can I learn more?
109+
SELinux in containers: see excellent
110+
[visual SELinux guide](https://opensource.com/business/13/11/selinux-policy-guide)
111+
by Daniel J Walsh. Note that the guide is older than Kubernetes, it describes
112+
*Multi-Category Security* (MCS) mode using virtual machines as an example,
113+
however, a similar concept is used for containers.
114+
115+
See a series of blog posts for details how exactly SELinux is applied to
116+
containers by container runtimes:
117+
* [How SELinux separates containers using Multi-Level Security](https://www.redhat.com/en/blog/how-selinux-separates-containers-using-multi-level-security)
118+
* [Why you should be using Multi-Category Security for your Linux containers](https://www.redhat.com/en/blog/why-you-should-be-using-multi-category-security-your-linux-containers)
119+
120+
Read the KEP: [Speed up SELinux volume relabeling using mounts](https://github.com/kubernetes/enhancements/tree/master/keps/sig-storage/1710-selinux-relabeling)

0 commit comments

Comments
 (0)