Skip to content

Commit abe2db3

Browse files
authored
Merge pull request #5096 from jsafrane/1.33-selinux
1710: selinux: Update the KEP for 1.33 and graduate to Beta
2 parents 08f6989 + ff1eb5d commit abe2db3

File tree

3 files changed

+52
-17
lines changed

3 files changed

+52
-17
lines changed

keps/prod-readiness/sig-storage/1710.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,6 @@ alpha:
55
# @deads2k for SELinuxChangePolicy alpha 1.32
66
beta:
77
approver: "@deads2k"
8+
# SELinuxMountReadWriteOncePod is beta since 1.29
9+
# @deads2k for SELinuxMount beta in 1.33
10+
# @deads2k for SELinuxChangePolicy beta 1.33

keps/sig-storage/1710-selinux-relabeling/README.md

Lines changed: 45 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,10 @@ Further in this KEP we assume that the SELinux is enabled on the system. This KE
118118

119119
See [SELinux documentation](https://selinuxproject.org/page/NB_MLS) for more details.
120120

121+
In this document we use `container_t` and `container_file_t` labels for container processes / files, which are the default labels on Fedora based distributions (AlmaLinux, CentOS, Red Hat Enterprise Linux, Rocky Linux, ...).
122+
For example, Debian uses `svirt_lxc_net_t` and `svirt_lxc_file_t` as the default labels for containers, but the principles are the same.
123+
The implementation of this KEP does not depend on the actual labels used in the system.
124+
121125
### SELinux label assignment
122126
In Kubernetes, the SELinux label of a pod is assigned in two ways:
123127
1. Either it is set by user in PodSpec or Container: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/.
@@ -465,13 +469,13 @@ spec:
465469
* Same as the previous story. Kubelet mounts the volume without any SELinux option + the container runtime relabels the volumes recursively.
466470

467471
**Feature gates `SELinuxMountReadWriteOncePod == true` && `SELinuxMount == false`**:
468-
* If `myclaim` is a RWOP volume (`Spec.AccessModes == ["ReadWriteOncePod']`) *and* the corresponding CSI drivers support SELinux mount, kubelet mounts the volume with `-o context=system_u:object_r:container_file_t:s0:c10,c0`.
472+
* If `myclaim` is a RWOP volume (`Spec.AccessModes == ["ReadWriteOncePod']`) *and* the corresponding CSI drivers support SELinux mount, kubelet fills the blanks in the `seLinuxOptions` from the system defaults (`user: system_u`, `role: object_r`, `type: container_t` on Fedora based distros), translates them to a file label (`container_t` -> `container_file_t`) and mounts the volume with `-o context=system_u:object_r:container_file_t:s0:c10,c0`.
469473
* If `myclaim` is any other volume, kubelet mounts the volume without any SELinux option + the container runtime relabels the volume recursively.
470474
* The secret token volume is relabeled by the container runtime, because Secret and Projected volumes do not support SELinux mount.
471475

472476
**Feature gates `SELinuxMountReadWriteOncePod == true` && `SELinuxMount == true`**:
473477
* Since there is no `SELinuxChangePolicy` set, kubelet implies `MountOption`.
474-
If the corresponding CSI driver (or in-tree volume plugin) support SELinux mount, the volume is mounted with `-o context=system_u:object_r:container_file_t:s0:c10,c0`.
478+
If the corresponding CSI driver (or in-tree volume plugin) support SELinux mount, kubelet fills the blanks in the `seLinuxOptions` from the system defaults as described above and the volume is mounted with `-o context=system_u:object_r:container_file_t:s0:c10,c0`.
475479
* Otherwise, kubelet mounts the volume without any SELinux option + the container runtime relabels the volume recursively.
476480
* The secret token volume is relabeled by the container runtime, because Secret and Projected volumes do not support SELinux mount.
477481

@@ -602,7 +606,12 @@ Drawbacks:
602606
* The controller may report a conflict when two Pods are scheduled to the same node, but they will run serially there.
603607
For example, one pod is already being deleted and the other has just been scheduled there.
604608
Kubelet's `volume_manager_selinux_volume_context_mismatch_warnings_total` metric is more accurate in this case.
605-
609+
* The controller cannot read the SELinux default container labels from the operating system.
610+
KCM often runs in a container and does not have access to `/etc/selinux` on the worker nodes.
611+
As consequence, two labels that are equivalent from the SELinux point of view, may be reported as different, such as these two `seLinuxOptions` snippets: `{"type": "container_t", "level": "s0:c10,c0"}` and `{"level": "s0:c10,c1"}`.
612+
`container_t` is the default type label for containers on Fedora, so kubelet is able to fill it in the `seLinuxOptions` when it is not set and see they're equivalent.
613+
KCM does not know the default on nodes and treats empty fields in `seLinuxOptions` as *uncomparable* - it does not emit any event in the above example.
614+
606615
### Implementation phases
607616

608617
Due to change of Kubernetes behavior, we will implement the feature only for cases where it can't break anything first.
@@ -647,32 +656,47 @@ No existing / new tests for volume mounting there.
647656

648657
* Check no recursive `chcon` is done on a volume when not needed.
649658
* Check recursive `chcon` is done on a volume when needed.
650-
* Check that proper metric is emitted when kubelet can't start two pods with different SELinux labels using the same volume on the same node._
651-
* These tests might use only CSI volumes, GCE PD in-tree volume plugin that we use for e2e tests might be already migrated to CSI by that time.
659+
* Check that kubelet emits proper metrics when it can't start two pods with different SELinux labels using the same volume on the same node._
660+
* Check that the SELinux warning controller emits events when pods conflict + emit the described metrics.
652661
* Prepare e2e job that runs with SELinux in Enforcing mode.
653662
* Done:
654663
* https://testgrid.k8s.io/kops-k8s-ci#kops-aws-selinux: for features enabled by default.
655-
* https://testgrid.k8s.io/kops-k8s-ci#kops-aws-selinux-alpha: for alpha features.
664+
* https://testgrid.k8s.io/kops-k8s-ci#kops-aws-selinux-alpha: for all alpha features enabled.
665+
* https://testgrid.k8s.io/kops-distro-rhel8#kops-aws-selinux-changepolicy: for `SELinuxChangePolicy` enabled + `SELinuxMount` disabled.
656666
* https://testgrid.k8s.io/presubmits-kubernetes-nonblocking#pull-kubernetes-e2e-gce-storage-selinux: for PRs (needs explicit `/test ` in a PR).
657667

668+
All these e2e tests use only CSI volumes. All in-tree volume types that support SELinux and dynamic provisioning were migrated to CSI already.
669+
658670
### Graduation Criteria
659671

660672
* Alpha of Phase 1:
661673
* Provided all tests defined above are passing and gated by the feature gate `SELinuxMountReadWriteOncePod` and set to a default of `false`.
662674
* Documentation exists.
663675
* Beta of Phase 1:
676+
* E2e tests implemented + green.
664677
* The feature gate is `true` by default.
665678
* Evaluation:
666679
* During the next release after Phase 1 is beta (= the feature is enabled by default), collect reports from users about possible breakage.
667680
* KEP author has access to usage data from OpenShift, a Kubernetes distro that runs with SELinux in enforcing mode.
668681
* Alpha of Phase 2 + 3:
669682
* Implemented `SELinuxChangePolicy` **with a separate alpha feature gate `SELinuxChangePolicy`** as preparation for `SELinuxMount` feature gate graduation.
670683
* Implemented SELinuxController.
671-
* Beta of Phase 2, alpha of phase 3:
684+
* Beta of Phase 2 + 3 (`SELinuxChangePolicy` is beta and enabled by default; `SELinuxMount` is beta, but disabled by default).
685+
* E2e tests implemented + green.
672686
* Telemetry numbers from OpenShift show that <5% of clusters would need to change any of their Pods.
673-
* GA:
687+
* This phase signalizes that the feature is ready for real testing.
688+
Only non-breaking parts (`SELinuxChangePolicy`) are enabled by default.
689+
Users willing to test `SELinuxMount` must enable it explicitly.
690+
* GA of Phase 2 (`SELinuxChangePolicy` + `SELinuxMountReadWriteOncePod` are GA and locked to default, `SELinuxMount` is beta and disabled by default):
674691
* All known issues fixed. Otherwise, we will GA Phase 1 only.
692+
* Users can update their clusters safely, there is no breaking change yet.
693+
Users willing to test `SELinuxMount` must enable it explicitly.
694+
* This phase allows production clusters to check what Pods (Deployments, StatefulSets) need update and fix them before the breaking part (`SELinuxMount`) is enabled by default in the next phase.
695+
* GA of Phase 3 (`SELinuxMount` is GA and locked to default):
696+
* At least 1 release after `SELinuxChangePolicy` is GA to give cluster admins enough time to apply `SELinuxChangePolicy` to their Pods.
675697
* Telemetry numbers from OpenShift show that <2% of clusters would need to change any of their Pods (i.e. most clusters already applied opt-out).
698+
* This is the phase that may break existing applications during cluster upgrade.
699+
Users that use SELinux should carefully evaluate the metrics emitted by kubelet and SELinuxWarningController and fix their workloads before upgrade to this version.
676700

677701
### Upgrade / Downgrade Strategy
678702

@@ -711,9 +735,9 @@ _This section must be completed when targeting alpha to a release._
711735
* **How can this feature be enabled / disabled in a live cluster?**
712736
- [X] Feature gate (also fill in values in `kep.yaml`)
713737
- Feature gate name: `SELinuxMountReadWriteOncePod` (beta in 1.28)
714-
- Feature gate name: `SELinuxChangePolicy` (alpha in 1.30)
738+
- Feature gate name: `SELinuxChangePolicy` (alpha in 1.30, proposing beta in 1.33)
715739
- To enable `SELinuxChangePolicy` feature gate, `SELinuxMountReadWriteOncePod` **must** be enabled too.
716-
- Feature gate name: `SELinuxMount` (alpha in 1.30)
740+
- Feature gate name: `SELinuxMount` (alpha in 1.30, proposing beta in 1.33)
717741
- To enable `SELinuxMount` feature gate, `SELinuxMountReadWriteOncePod` and `SELinuxChangePolicy` **must** be enabled too.
718742
- Components depending on the feature gate: apiserver (API validation only), kubelet
719743
- [ ] Other
@@ -728,6 +752,7 @@ _This section must be completed when targeting alpha to a release._
728752
automations, so be extremely careful here.
729753

730754
**Yes.** See [Conflict with other Pods](#conflicts-with-other-pods) for details.
755+
We offer metrics + events + proactive opt-out per Pod before the breaking part (`SELinuxMount`) is enabled by default.
731756

732757
* **Can the feature be disabled once it has been enabled (i.e. can we rollback
733758
the enablement)?**
@@ -896,7 +921,8 @@ previous answers based on experience in the field._
896921

897922
* **Will enabling / using this feature result in any new API calls?**
898923

899-
No new API calls are required. Kubelet / CSI volume plugin already has CSIDriver informer.
924+
* No new API calls are required in kubelet, its CSI volume plugin already has CSIDriver informer.
925+
* KCM will emit new events when SELinuxWarningController is enabled. It already has Pod, PV, PVC, CSIDriver informers and does not do other API calls.
900926

901927
* **Will enabling / using this feature result in introducing new API types?**
902928

@@ -909,8 +935,9 @@ previous answers based on experience in the field._
909935

910936
* **Will enabling / using this feature result in increasing size or count of the existing API objects?**
911937

912-
CSIDriver gets one new field. We expect only few CSIDriver objects in a cluster.
913-
PodSpec gets one new field, and we expect it to be `null` for the vast majority of Pods.
938+
* CSIDriver gets one new field. We expect only few CSIDriver objects in a cluster.
939+
* PodSpec gets one new field, and we expect it to be `null` for the vast majority of Pods.
940+
* Event(s) will be created for every conflicting Pod pair when SELinuxWarningController is enabled.
914941

915942
* **Will enabling / using this feature result in increasing time taken by any
916943
operations covered by [existing SLIs/SLOs][]?**
@@ -927,7 +954,7 @@ previous answers based on experience in the field._
927954
This through this both in small and large cases, again with respect to the
928955
[supported limits][].
929956

930-
No. Kubelet already has a cache of desired / existing mounts, we need to add
957+
No. KCM and Kubelet already has a cache of desired / existing mounts, we need to add
931958
a string with SELinux label to each one, which should be negligible.
932959

933960
* **Can enabling / using this feature result in resource exhaustion of some node
@@ -968,6 +995,7 @@ _This section must be completed when targeting beta graduation to a release._
968995

969996
- *Kubelet des not start new Pods*
970997
- Detection: `volume_manager_selinux_container_errors_total`, `volume_manager_selinux_pod_context_mismatch_errors_total` or `volume_manager_selinux_volume_context_mismatch_errors_total` grows.
998+
In addition, each such Pod has an event about SELinux label mismatch.
971999
- Mitigations: What can be done to stop the bleeding, especially for already
9721000
running user workloads?
9731001
Workloads that run keep running, only new Pods can't start.
@@ -998,6 +1026,9 @@ _This section must be completed when targeting beta graduation to a release._
9981026
* We discovered that sharing volumes between privileged and unprivileged containers as described [here](#privileged-containers) is a valid use case.
9991027
we cannot mount *all* volumes with `-o context` and it must be an explicit opt-out using `SELinuxChangePolicy: Recursive`.
10001028
* Implement `SELinuxChangePolicy` as an alpha field.
1029+
* 1.33: Graduate `SELinuxMount` to beta / disabled by default, `SELinuxChangePolicy` to beta / enabled by default.
1030+
* Add e2e tests for the SELinuxWarningController.
1031+
* Test on non-Fedora based Linux distribution (e.g. Debian) with SELinux enabled.
10011032

10021033
## Drawbacks [optional]
10031034

keps/sig-storage/1710-selinux-relabeling/kep.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,17 @@ approvers:
1818
- "@saad-ali"
1919
see-also:
2020
- /keps/sig-storage/695-skip-permission-change/README.md
21-
stage: alpha
22-
latest-milestone: "v1.32"
21+
stage: beta
22+
latest-milestone: "v1.33"
2323
milestone:
2424
alpha: "v1.24" # SELinuxMountReadWriteOncePod
2525
beta: "v1.27" # SELinuxMountReadWriteOncePod
2626
stable: "v1.34" # Very optimistic plan for SELinuxMountReadWriteOncePod GA, needs SELinuxMount very close to GA
2727

2828
# alpha: "v1.30" # SELinuxMount
2929
# alpha: "v1.32" # SELinuxChangePolicy
30-
30+
# beta: "v1.33" # SELinuxChangePolicy (enabled by default)
31+
# beta: "v1.33" # SELinuxMount (disabled by default)
3132
feature-gates:
3233
- name: SELinuxMountReadWriteOncePod
3334
components:

0 commit comments

Comments
 (0)