You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The KEP seeks to provide a way to choose correct behavior with how Container Runtimes (Containerd and CRI-O) are applying `SupplementalGroups` to the first container processes. The KEP describes the work needed to be done in Kubernetes or connected projects to make sure customers have a clear migration path - including detection and safe upgrade - if any of their workflows took a dependency on this arguably erroneous behavior.
As described above, how supplemental groups attached to the first container process is complicated and not OCI image spec compliant.
165
257
166
258
Moreover, this causes security considerations as follows. When a cluster enforces some security policy for pods that protects the value of `RunAsGroup` and `SupplementalGroups`, the effect of its enforcement is limited, i.e., cluster users can easily bypass the policy enforcement just by using a custom image. If such a bypass happened, it would be unexpected behavior for most cluster administrators because the enforcement is almost useless. Moreover, the bypass will cause unexpected file access permission. In some use cases, the unexpected file access permission will be a security concern. For example, using `hostPath` volumes could be a severe problem because UID/GIDs matter in accessing files/directories in the volumes.
@@ -254,7 +346,7 @@ message ContainerUser {
254
346
}
255
347
```
256
348
257
-
### User Stories
349
+
### User Stories (Optional)
258
350
259
351
<!--
260
352
Detail the things that people will be able to do if this KEP is implemented.
@@ -263,6 +355,7 @@ the system. The goal here is to make this feel real for users without getting
263
355
bogged down.
264
356
-->
265
357
358
+
266
359
#### Story 1: Deploy a Security Policy to enforce `SupplementalGroupsPolicy` field
267
360
268
361
Assume a multi-tenant kubernetes cluster with `hostPath` volumes below situations:
@@ -294,6 +387,8 @@ As described in [Summary](#summary) section, `alice` can bypass the restriction
294
387
295
388
Please note that a security policy without `supplementalGroupsPolicy` would lead to unexpected groups for the first process in the containers.
296
389
390
+
<!-- #### Story 2 -->
391
+
297
392
### Notes/Constraints/Caveats (Optional)
298
393
299
394
<!--
@@ -325,6 +420,13 @@ Consider including folks who also work outside the SIG or subproject.
325
420
326
421
## Design Details
327
422
423
+
<!--
424
+
This section should contain enough information that the specifics of your
425
+
change are understandable. This may include API specs (though not always
426
+
required) or even code snippets. If there's any ambiguity about HOW your
427
+
proposal will be implemented, this is the place to discuss them.
428
+
-->
429
+
328
430
### Kubernetes API
329
431
330
432
#### SupplementalGroupsPolicy in PodSecurityContext
@@ -626,7 +728,7 @@ enhancement:
626
728
CRI or CNI may require updating that component before the kubelet.
627
729
-->
628
730
629
-
- CRI must support this feature, especially when using `SupplementalGroupsPolicy=IgnoreGroupsInImage`.
731
+
- CRI must support this feature, especially when using `SupplementalGroupsPolicy=Strict`.
630
732
- kubelet must be at least the version of control-plane components.
631
733
632
734
## Production Readiness Review Questionnaire
@@ -687,6 +789,7 @@ well as the [existing list] of feature gates.
687
789
Any change of default behavior may be surprising to users or break existing
688
790
automations, so be extremely careful here.
689
791
-->
792
+
690
793
No. Just introducing new API fields in Pod spec and CRI which does NOT change the default behavior.
691
794
692
795
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
@@ -702,11 +805,11 @@ feature.
702
805
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
703
806
-->
704
807
705
-
Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `IgnoreGroupsInImage` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
808
+
Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `Strict` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
706
809
707
810
###### What happens if we reenable the feature if it was previously rolled back?
708
811
709
-
Just the policy `IgnoreGroupsInImage` is reenabled. Users should pay attention that gids of containers in pods with `IgnoreGroupsInImage` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
812
+
Just the policy `Stcict` is reenabled. Users should pay attention that gids of containers in pods with `Stcict` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
710
813
711
814
###### Are there any tests for feature enablement/disablement?
712
815
@@ -919,7 +1022,7 @@ Describe them, providing:
919
1022
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
920
1023
-->
921
1024
922
-
No.
1025
+
Precisely, yes because the kep introduces new API fields in Pods. But the increasing size can be negligible.
923
1026
924
1027
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
925
1028
@@ -948,6 +1051,18 @@ This through this both in small and large cases, again with respect to the
948
1051
949
1052
No.
950
1053
1054
+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
1055
+
1056
+
<!--
1057
+
Focus not just on happy cases, but primarily on more pathological cases
1058
+
(e.g. probes taking a minute instead of milliseconds, failed pods consuming resources, etc.).
1059
+
If any of the resources can be exhausted, how this is mitigated with the existing limits
1060
+
(e.g. pods per node) or new limits added by this KEP?
1061
+
1062
+
Are there any tests that were run/should be run to understand performance characteristics better
1063
+
and validate the declared limits?
1064
+
-->
1065
+
951
1066
### Troubleshooting
952
1067
953
1068
<!--
@@ -999,6 +1114,8 @@ Major milestones might include:
999
1114
Why should this KEP _not_ be implemented?
1000
1115
-->
1001
1116
1117
+
N/A
1118
+
1002
1119
## Alternatives
1003
1120
1004
1121
<!--
@@ -1027,4 +1144,4 @@ new subproject, repos requested, or GitHub details. Listing these here allows a
1027
1144
SIG to get the process for these resources started right away.
0 commit comments