@@ -677,8 +677,8 @@ To assign SELinux labels, the SELinux security module must be loaded on the host
677
677
Kubernetes v1.27 introduced an early limited form of this behavior that was only applicable
678
678
to volumes (and PersistentVolumeClaims) using the `ReadWriteOncePod` access mode.
679
679
680
- As an alpha feature, you can enable the `SELinuxMount`
681
- [feature gate ](/docs/reference/command-line-tools-reference/feature-gates/) to widen that
680
+ As an alpha feature, you can enable the `SELinuxMount` and `SELinuxChangePolicy`
681
+ [feature gates ](/docs/reference/command-line-tools-reference/feature-gates/) to widen that
682
682
performance improvement to other kinds of PersistentVolumeClaims, as explained in detail
683
683
below.
684
684
{{< /note >}}
@@ -694,7 +694,9 @@ To benefit from this speedup, all these conditions must be met:
694
694
and `SELinuxMountReadWriteOncePod` must be enabled.
695
695
* Pod must use PersistentVolumeClaim with applicable `accessModes` and [feature gates](/docs/reference/command-line-tools-reference/feature-gates/):
696
696
* Either the volume has `accessModes: ["ReadWriteOncePod"]`, and feature gate `SELinuxMountReadWriteOncePod` is enabled.
697
- * Or the volume can use any other access modes and both feature gates `SELinuxMountReadWriteOncePod` and `SELinuxMount` must be enabled.
697
+ * Or the volume can use any other access modes and both feature gates
698
+ ` SELinuxMountReadWriteOncePod` , `SELinuxChangePolicy` and `SELinuxMount` must be enabled
699
+ and the Pod has `spec.securityContext.seLinuxChangePolicy` either nil (default) or `MountOption`.
698
700
* Pod (or all its Containers that use the PersistentVolumeClaim) must
699
701
have `seLinuxOptions` set.
700
702
* The corresponding PersistentVolume must be either:
@@ -706,7 +708,48 @@ To benefit from this speedup, all these conditions must be met:
706
708
For any other volume types, SELinux relabelling happens another way : the container
707
709
runtime recursively changes the SELinux label for all inodes (files and directories)
708
710
in the volume.
709
- The more files and directories in the volume, the longer that relabelling takes.
711
+
712
+ {{< feature-state feature_gate_name="SELinuxChangePolicy" >}}
713
+ For Pods that want to opt-out from relabeling using mount options, they can set
714
+ ` spec.securityContext.seLinuxChangePolicy` to `Recursive`. This is required
715
+ when multiple pods share a single volume on the same node, but they run with
716
+ different SELinux labels that allows simultaneous access to the volume. For example, a privileged pod
717
+ running with label `spc_t` and an unprivileged pod running with the default label `container_file_t`.
718
+ With unset `spec.securityContext.seLinuxChangePolicy` (or with the value `MountOption`),
719
+ only one of such pods is able to run on a node, the other one gets ContainerCreating with error
720
+ `conflicting SELinux labels of volume <name of the volume> : <label of the running pod> and <label of the pod that can't start>`.
721
+
722
+ # ### SELinuxWarningController
723
+ To make it easier to identify Pods that are affected by the change in SELinux volume relabeling,
724
+ a new controller called `SELinuxWarningController` has been introduced in kube-controller-manager.
725
+ It is disabled by default and can be enabled by setting the `--controllers=*,selinux-warning-controller` command line flag
726
+ and `SELinuxChangePolicy` feature gate.
727
+ When enabled, the controller observes running Pods and when it detects that two Pods use the same volume
728
+ with different SELinux labels :
729
+ 1. It emits an event to both of the Pods. `kubectl describe pod <pod-name>` the shows
730
+ ` SELinuxLabel "<label on the pod>" conflicts with pod <the other pod name> that uses the same volume as this pod
731
+ with SELinuxLabel "<the other pod label>". If both pods land on the same node, only one of them may access the volume` .
732
+ 2. Raise `selinux_warning_controller_selinux_volume_conflict` metric. The metric has both pod
733
+ names + namespaces as labels to identify the affected pods easily.
734
+
735
+ A cluster admin can use this information to identify pods affected by the planning change and
736
+ proactively opt-out Pods from the optimization (i.e. set `spec.securityContext.seLinuxChangePolicy : Recursive`).
737
+
738
+ # ### Feature gates
739
+
740
+ The following feature gates control the behavior of SELinux volume relabeling :
741
+
742
+ * `SELinuxMountReadWriteOncePod`: enables the optimization for volumes with `accessModes: ["ReadWriteOncePod"]`.
743
+ This is a very safe feature gate to enable, as it cannot happen that two pods can share one single volume with
744
+ this access mode. This feature gate is enabled by default sine v1.28.
745
+ * `SELinuxChangePolicy`: enables `spec.securityContext.seLinuxChangePolicy` field in Pod and related SELinuxWarningController
746
+ in kube-controller-manager. This feature can be used before enabling `SELinuxMount` to check Pods running on a cluster,
747
+ and to pro-actively opt-out Pods from the optimization.
748
+ This feature gate requires `SELinuxMountReadWriteOncePod` enabled. It is alpha and disabled by default in 1.32.
749
+ * `SELinuxMount` enables the optimization for all eligible volumes. Since it can break existing workloads, we recommend
750
+ enabling `SELinuxChangePolicy` feature gate + SELinuxWarningController first to check the impact of the change.
751
+ This feature gate requires `SELinuxMountReadWriteOncePod` and `SELinuxChangePolicy` enabled. It is alpha and disabled
752
+ by default in 1.32.
710
753
711
754
# # Managing access to the `/proc` filesystem {#proc-access}
712
755
0 commit comments