You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/3619-supplemental-groups-policy/README.md
+169-9Lines changed: 169 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -672,8 +672,44 @@ Because this KEP's core implementation(i.e. `SupplementalGroupsPolicy` handling)
672
672
673
673
### Version Skew Strategy
674
674
675
-
- CRI must support this feature, especially when using `SupplementalGroupsPolicy=Strict`.
676
-
- kubelet must be at least the version of control-plane components.
675
+
Existing pods will still work as intended, as the new field is missing there
676
+
(i.e. no `SupplementalGroupsPolicy` fields in existing Pods' spec).
677
+
678
+
For upgrade, it will not change any current behaviors. But, please note that if you plan to use `Strict` SupplementalGroupsPolicy after the upgrade,
679
+
we assume your CRI runtime in the cluster also support this feature (See ["Dependencies"](#dependencies) section).
680
+
If there are some nodes whose CRI runtime does NOT support this feature,
681
+
- the creation of pods with `Strict` policy will be rejected depending if the feature levels of the upgraded version was beta or above,
682
+
- the `Strict` policy will fallback to `Merge` silently if the feature level of the upgraded version was alpha.
683
+
Please see the below matrix for more details.
684
+
685
+
For downgrade, when the functionality wasn't yet used, downgrade will not be affected. But, when the functionality, especially `Strict` SupplementalGroupsPolicy, was already used, there need to be caution:
686
+
- the running containers will continue to run with its effective policy as long as the container was not recreated.
687
+
- However, when the containers in such pods are recreated in the node, the behavior will be varied by downgraded version, the downgraded feature gate value, and its CRI runtime support status (see the below matrix).
688
+
689
+
The below matrix summarizes what will happen by upgraded/downgraded target versions, target feature gate, target CRI runtime support status:
690
+
691
+
| Target<br />kubelet version | Target<br/>Feature Gate | Target<br/>CRI runtime<br /> support the feature? | Pod's policy | Effective Policy | Rejected By Kubelet? |`.containerStatuses.user` reported? |
| <1.31<br/>(does not know the field) | N/A | Yes/No |`Strict`|`Merge`<br />(fallback silently) | NO | NO |
694
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
695
+
| 1.31 or 1.32<br/>(Alpha) |`True`| YES |`Strict`|`Strict`| NO | YES |
696
+
||||`Merge`<br />/(not set) |`Merge`| NO | YES |
697
+
||| NO |`Strict`|`Merge`<br />(fallback silently) | NO | NO |
698
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
699
+
||`False`| YES |`Strict`<br />(set when the feature was on) |`Strict`| NO | NO |
700
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
701
+
||| NO |`Strict`<br />(set when the feature was on) |`Merge`<br />(fallback silently) | NO | NO |
702
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
703
+
| >=1.33<br />(Beta or above) |`True`<br />(default) | YES |`Strict`|`Strict`| NO | YES |
704
+
||||`Merge`<br />/(not set) |`Merge`| NO | YES |
705
+
||| NO |`Strict`| - |__REJECTED__(*) | NO |
706
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
707
+
||`False`| YES |`Strict`<br />(set when the feature was) |`Strict`| NO | NO |
708
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
709
+
||| NO |`Strict`<br />(set when the feature was) | - |__REJECTED__(*) | NO |
710
+
||||`Merge`<br />/(not set) |`Merge`| NO | NO |
711
+
712
+
_(*): See ["What specific metrics should inform a rollback?"](#what-specific-metrics-should-inform-a-rollback) for details_
677
713
678
714
## Production Readiness Review Questionnaire
679
715
@@ -749,11 +785,18 @@ feature.
749
785
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
750
786
-->
751
787
752
-
Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `Strict` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
788
+
Yes. It can be disabled after enabled.
789
+
When disabled, you can not create pods with `SupplementalGroupsPolicy` fields and no `.status.containerStatuses[*].user` will be reported in pod status.
790
+
Please note if there are pods that have been created with `Strict` policy, the policy of the containers in such pods will keep enforced even after its disablement.
791
+
792
+
See ["Version Skew Strategy"](#version-skew-strategy) for more complex cases (including upgrading/downgrading).
753
793
754
794
###### What happens if we reenable the feature if it was previously rolled back?
755
795
756
-
Just the policy `Stcict` is reenabled. Users should pay attention that gids of containers in pods with `Stcict` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
796
+
The `SupplementalGroupsPolicy` field in pod spec and `.status.containerStatuses[*].user` in pod status will be available again.
797
+
As described above section, for pods that have been created with `Strict` policy before, the policy of the containers in such pods will still keep enforced after its re-enablement.
798
+
799
+
See ["Version Skew Strategy"](#version-skew-strategy) for more complex cases (including upgrading/downgrading).
757
800
758
801
###### Are there any tests for feature enablement/disablement?
759
802
@@ -790,13 +833,53 @@ rollout. Similarly, consider large clusters and how enablement/disablement
790
833
will rollout across nodes.
791
834
-->
792
835
836
+
As long as you does not use the `SupplementalGroupsPolicy` fields, rollout or rollback will be safe. And, there is no impact to already running workloads because the feature have backward compatible.
837
+
838
+
However, if there exist pods with `SupplementalGroupsPolicy` fields when to rollout/rollback, there need to be caution.
839
+
Please see the matrix in ["Version Skew Strategy"](#version-skew-strategy) section for details.
840
+
793
841
###### What specific metrics should inform a rollback?
794
842
795
843
<!--
796
844
What signals should users be paying attention to when the feature is young
797
845
that might indicate a serious problem?
798
846
-->
799
847
848
+
As long as you does not use the `SupplementalGroupsPolicy` fields, rollout or rollback will be safe as described in the above section.
849
+
850
+
However, if there exist pods with `SupplementalGroupsPolicy` fields when to rollout/rollback, pod creation rejection might happen when
851
+
- the feature level of rollout-ed/rollback-ed version is beta or above, and
852
+
- pods with `Strict` policy (set when the feature gate was on previously) are scheduled to the nodes whose CRI runtime does NOT support this feature.
853
+
854
+
In that case, please look for an event saying indicating SupplementalGroupsPolicy is not supported by the node as the rollback signal.
855
+
856
+
```console
857
+
$ kubectl get events -o json -w
858
+
...
859
+
{
860
+
...
861
+
"kind": "Event",
862
+
"message": "Error: SupplementalGroupsPolicy is not supported in this node.",
863
+
...
864
+
}
865
+
...
866
+
```
867
+
868
+
Also, the following kubelet metrics are also useful to check:
869
+
870
+
-`kubelet_running_pods`: Shows the actual number of pods running
871
+
-`kubelet_desired_pods`: The number of pods the kubelet is trying to run
872
+
873
+
If these metrics are different, it means there are desired pods that can't be set to running.
874
+
If that is the case, checking the pod events to see if they are failing for SupplementalGroupsPolicy reasons
875
+
(like the errors shown in above) is advised, in which case it is recommended to rollback.
876
+
877
+
Even this KEP does NOT include kube-scheduler integration to ensure to let the scheduler place pods requires
878
+
the feature(`Strict` policy) to the nodes which support this feature, you can use node labels and
879
+
pod's `nodeSelector`/`nodeAffinity` to mitigate pod rejection or error events. Please see
880
+
["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature)
881
+
section below for details.
882
+
800
883
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
801
884
802
885
<!--
@@ -805,12 +888,22 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
805
888
are missing a bunch of machinery and tooling and can't do that now.
806
889
-->
807
890
891
+
During the beta phase, the following test will be manually performed:
892
+
- Enable the `SupplementalGroupsPolicy` feature gate for kube-apiserver and kubelet.
893
+
- Create a pod with `supplementalGroupsPolicy` specified.
894
+
- Disable the `SupplementalGroupsPolicy` feature gate for kube-apiserver, and confirm that the pod gets rejected.
895
+
- Enable the `SupplementalGroupsPolicy` feature gate again, and confirm that the pod gets scheduled again.
896
+
- Do the same for kubelet too.
897
+
898
+
808
899
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
809
900
810
901
<!--
811
902
Even if applying deprecation policies, they may still surprise some users.
812
903
-->
813
904
905
+
No.
906
+
814
907
### Monitoring Requirements
815
908
816
909
<!--
@@ -828,6 +921,12 @@ checking if there are objects with field X set) may be a last resort. Avoid
828
921
logs or events for this purpose.
829
922
-->
830
923
924
+
Inspect the `supplementalGroupsPolicy` fields in Pods. You can check if the following `jq` command prints non-zero number:
925
+
926
+
```bash
927
+
kubectl get pods -A -o json | jq '[.items[].spec.securityContext? | select(.supplementalGroupsPolicy)] | length'
928
+
```
929
+
831
930
###### How can someone using this feature know that it is working for their instance?
832
931
833
932
<!--
@@ -841,8 +940,8 @@ Recall that end users cannot usually observe component logs or access metrics.
841
940
842
941
-[ ] Events
843
942
- Event Reason:
844
-
-[] API .status
845
-
- Condition name:
943
+
-[x] API .status
944
+
- Condition name: `containerStatuses.user`
846
945
- Other field:
847
946
-[ ] Other (treat as last resort)
848
947
- Details:
@@ -864,16 +963,24 @@ These goals will help you determine what you need to measure (SLIs) in the next
864
963
question.
865
964
-->
866
965
966
+
-`supplementalGroupsPolicy=Strict`: 100% of pods were scheduled into a node with the feature supported.
967
+
Even this KEP does NOT include scheduler integration, please see
968
+
["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) section for this.
969
+
970
+
-`supplementalGroupsPolicy=Merge`: 100% of pods were scheduled into a node with or without the feature supported.
971
+
972
+
-`supplementalGroupsPolicy` is unset: 100% of pods were scheduled into a node with or without the feature supported.
973
+
867
974
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
868
975
869
976
<!--
870
977
Pick one more of these and delete the rest.
871
978
-->
872
979
873
-
-[] Metrics
980
+
-[x] Metrics
874
981
- Metric name:
875
-
-[Optional] Aggregation method:
876
-
- Components exposing the metric:
982
+
-[Optional] Aggregation method:`kubectl get events -o json -w`
983
+
- Components exposing the metric: kubelet -> kube-apiserver
877
984
-[ ] Other (treat as last resort)
878
985
- Details:
879
986
@@ -884,6 +991,24 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
884
991
implementation difficulties, etc.).
885
992
-->
886
993
994
+
Potentially, kube-scheduler could implement a rule to avoid scheduling a pod with `supplementalGroupsPolicy: Strict`
995
+
to a node not supporting this feature.
996
+
997
+
However, this is not covered by this KEP. It is because that more generic way would be nice in Kubernetes so that scheduler can schedule pods which requires node feature X
998
+
to the nodes which support node feature X.
999
+
1000
+
As of v1.33, although kubernetes does not offer such generic way to do this, cluster admins can maintain node labels and use `nodeSelector`/`nodeAffinity` in pods instead.
1001
+
1002
+
There are several way to automate them:
1003
+
1004
+
- By Mutating Webhook:
1005
+
- for nodes, which transforms `Node.Status.Feature.SupplementalGroupsPolicy` field to some node label(say `supplementalgroupspolicy-supported: "true" | "false"`),
1006
+
- for pods, which mutates an additional `.spec.nodeSelector: { "supplementalgroupspolicy-supported": "true" }` when the pod specifies `Strict` policy.
1007
+
- By Mutating Admission Policy:
1008
+
- although the feature is still alpha as of v1.32, you can write the equivalent policy to do this.
1009
+
1010
+
If you appropriately managed the node labels and pods' `nodeSelector`/`nodeAffinity`, the error events or pod rejection will not expect to happen. Instead, you will need to watch `Pending` pods if there are sufficient number of nodes supporting SupplementalGroupsPolicy in the cluster.
1011
+
887
1012
### Dependencies
888
1013
889
1014
<!--
@@ -907,6 +1032,12 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
907
1032
- Impact of its degraded performance or high-error rates on the feature:
908
1033
-->
909
1034
1035
+
Container runtimes supporting [CRI api v0.31.0](https://github.com/kubernetes/cri-api/tree/v0.31.0) or above.
1036
+
1037
+
For example,
1038
+
- containerd: v2.0 or later
1039
+
- CRI-O: v1.31 or later
1040
+
910
1041
### Scalability
911
1042
912
1043
<!--
@@ -919,6 +1050,20 @@ For GA, this section is required: approvers should be able to confirm the
919
1050
previous answers based on experience in the field.
920
1051
-->
921
1052
1053
+
1054
+
A pod with `supplementalGroupsPolicy: Strict` may be rejected by kubelet with the probablility of $$B/A$$,
1055
+
where $$A$$ is the number of all the nodes that may potentially accept the pod,
1056
+
and $$B$$ is the number of the nodes that may potentially accept the pod but does not support this feature.
1057
+
This may affect scalability.
1058
+
1059
+
To evaluate this risk, users may run
1060
+
`kubectl get nodes -o json | jq '[.items[].status.features]'`
1061
+
to see how many nodes support `supplementalGroupsPolicy: true` before using `Strict` policy.
1062
+
1063
+
To mitigate this probability, you can also manage node labels and pod's `nodeSelector`/`nodeAffinity` to
1064
+
ensure pods with `Strict` policy to the nodes which support SupplementalGroupPolicy feature.
1065
+
Please see ["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) section.
1066
+
922
1067
###### Will enabling / using this feature result in any new API calls?
923
1068
924
1069
<!--
@@ -1024,6 +1169,8 @@ details). For now, we leave it here.
1024
1169
1025
1170
###### How does this feature react if the API server and/or etcd is unavailable?
1026
1171
1172
+
A pod cannot be created, just as in other pods.
1173
+
1027
1174
###### What are other known failure modes?
1028
1175
1029
1176
<!--
@@ -1039,8 +1186,21 @@ For each of them, fill in the following information by copying the below templat
1039
1186
- Testing: Are there any tests for failure mode? If not, describe why.
1040
1187
-->
1041
1188
1189
+
None.
1190
+
1042
1191
###### What steps should be taken if SLOs are not being met to determine the problem?
1043
1192
1193
+
- Make sure that the node is running with CRI runtime which supports this feature.
1194
+
- Make sure that `crictl info` (with the latest crictl)
1195
+
reports that `supplemental_groups_policy` is supported.
1196
+
Otherwise upgrade the CRI runtime, and make sure that no relevant error is printed in
1197
+
the CRI runtime's log.
1198
+
- Make sure that `kubectl get nodes -o json | jq '[.items[].status.features]'`
1199
+
(with the latest kubectl and control plane)
1200
+
reports that `supplementalGroupsPolicy` is supported.
1201
+
Otherwise upgrade the CRI runtime, and make sure that no relevant error is printed in
0 commit comments