Skip to content

Commit d012a52

Browse files
committed
KEP-3619: Rewrite "Version Skew Policy"/"Rollout/Rollback Policy" section
1 parent 69d37fe commit d012a52

File tree

1 file changed

+96
-29
lines changed
  • keps/sig-node/3619-supplemental-groups-policy

1 file changed

+96
-29
lines changed

keps/sig-node/3619-supplemental-groups-policy/README.md

Lines changed: 96 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -672,8 +672,44 @@ Because this KEP's core implementation(i.e. `SupplementalGroupsPolicy` handling)
672672

673673
### Version Skew Strategy
674674

675-
- CRI must support this feature, especially when using `SupplementalGroupsPolicy=Strict`.
676-
- kubelet must be at least the version of control-plane components.
675+
Existing pods will still work as intended, as the new field is missing there
676+
(i.e. no `SupplementalGroupsPolicy` fields in existing Pods' spec).
677+
678+
For upgrade, it will not change any current behaviors. But, please note that if you plan to use `Strict` SupplementalGroupsPolicy after the upgrade,
679+
we assume your CRI runtime in the cluster also support this feature (See ["Dependencies"](#dependencies) section).
680+
If there are some nodes whose CRI runtime does NOT support this feature,
681+
- the creation of pods with `Strict` policy will be rejected depending if the feature levels of the upgraded version was beta or above,
682+
- the `Strict` policy will fallback to `Merge` silently if the feature level of the upgraded version was alpha.
683+
Please see the below matrix for more details.
684+
685+
For downgrade, when the functionality wasn't yet used, downgrade will not be affected. But, when the functionality, especially `Strict` SupplementalGroupsPolicy, was already used, there need to be caution:
686+
- the running containers will continue to run with its effective policy as long as the container was not recreated.
687+
- However, when the containers in such pods are recreated in the node, the behavior will be varied by downgraded version, the downgraded feature gate value, and its CRI runtime support status (see the below matrix).
688+
689+
The below matrix summarizes what will happen by upgraded/downgraded target versions, target feature gate, target CRI runtime support status:
690+
691+
| Target<br />kubelet version | Target<br/>Feature Gate | Target<br/>CRI runtime<br /> support the feature? | Pod's policy | Effective Policy | Rejected By Kubelet? | `.containerStatuses.user` reported? |
692+
| :---------------------------------: | :---------------------: | :-----------------------------------------------: | :-----------------------------------------: | :------------------------------: | :------------------: | :---------------------------------: |
693+
| <1.31<br/>(does not know the field) | N/A | Yes/No | `Strict` | `Merge`<br />(fallback silently) | NO | NO |
694+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
695+
| 1.31 or 1.32<br/>(Alpha) | `True` | YES | `Strict` | `Strict` | NO | YES |
696+
| | | | `Merge`<br />/(not set) | `Merge` | NO | YES |
697+
| | | NO | `Strict` | `Merge`<br />(fallback silently) | NO | NO |
698+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
699+
| | `False` | YES | `Strict`<br />(set when the feature was on) | `Strict` | NO | NO |
700+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
701+
| | | NO | `Strict`<br />(set when the feature was on) | `Merge`<br />(fallback silently) | NO | NO |
702+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
703+
| >=1.33<br />(Beta or above) | `True`<br />(default) | YES | `Strict` | `Strict` | NO | YES |
704+
| | | | `Merge`<br />/(not set) | `Merge` | NO | YES |
705+
| | | NO | `Strict` | - | __REJECTED__(*) | NO |
706+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
707+
| | `False` | YES | `Strict`<br />(set when the feature was) | `Strict` | NO | NO |
708+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
709+
| | | NO | `Strict`<br />(set when the feature was) | - | __REJECTED__(*) | NO |
710+
| | | | `Merge`<br />/(not set) | `Merge` | NO | NO |
711+
712+
_(*): See ["What specific metrics should inform a rollback?"](#what-specific-metrics-should-inform-a-rollback) for details_
677713

678714
## Production Readiness Review Questionnaire
679715

@@ -749,11 +785,18 @@ feature.
749785
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
750786
-->
751787

752-
Yes. It can be disabled after enabled. However, users should pay attention that gids of container processes in pods with `Strict` policy would change. It means the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
788+
Yes. It can be disabled after enabled.
789+
When disabled, you can not create pods with `SupplementalGroupsPolicy` fields and no `.status.containerStatuses[*].user` will be reported in pod status.
790+
Please note if there are pods that have been created with `Strict` policy, the policy of the containers in such pods will keep enforced even after its disablement.
791+
792+
See ["Version Skew Strategy"](#version-skew-strategy) for more complex cases (including upgrading/downgrading).
753793

754794
###### What happens if we reenable the feature if it was previously rolled back?
755795

756-
Just the policy `Stcict` is reenabled. Users should pay attention that gids of containers in pods with `Stcict` policy would change. It means that the action might break the application in permission. We plan to provide a way for users to detect which pods are affected.
796+
The `SupplementalGroupsPolicy` field in pod spec and `.status.containerStatuses[*].user` in pod status will be available again.
797+
As described above section, for pods that have been created with `Strict` policy before, the policy of the containers in such pods will still keep enforced after its re-enablement.
798+
799+
See ["Version Skew Strategy"](#version-skew-strategy) for more complex cases (including upgrading/downgrading).
757800

758801
###### Are there any tests for feature enablement/disablement?
759802

@@ -790,23 +833,10 @@ rollout. Similarly, consider large clusters and how enablement/disablement
790833
will rollout across nodes.
791834
-->
792835

793-
A rollout may fail when at least one of the following components are too old because this KEP introduces the new Kubernetes API field:
794-
795-
| Component | `supplementalGroupsPolicy` value that will cause an error |
796-
|----------------|-----------------------------------------------------------|
797-
| kube-apiserver | not null |
798-
| kubelet | not null |
799-
| CRI runtime | `Strict` |
836+
As long as you does not use the `SupplementalGroupsPolicy` fields, rollout or rollback will be safe. And, there is no impact to already running workloads because the feature have backward compatible.
800837

801-
802-
For example, an error will be returned like this if kube-apiserver is too old:
803-
```console
804-
$ kubectl apply -f supplementalgroupspolicy.yaml
805-
Error from server (BadRequest): error when creating "supplementalgroupspolicy.yaml": Pod in version "v1" cannot be handled as a Pod:
806-
strict decoding error: unknown field "spec.securityContext.supplementalGroupsPolicy"
807-
```
808-
809-
No impact on already running workloads.
838+
However, if there exist pods with `SupplementalGroupsPolicy` fields when to rollout/rollback, there need to be caution.
839+
Please see the matrix in ["Version Skew Strategy"](#version-skew-strategy) section for details.
810840

811841
###### What specific metrics should inform a rollback?
812842

@@ -815,19 +845,41 @@ What signals should users be paying attention to when the feature is young
815845
that might indicate a serious problem?
816846
-->
817847

818-
Look for an event saying indicating SupplementalGroupsPolicy is not supported by the runtime.
848+
As long as you does not use the `SupplementalGroupsPolicy` fields, rollout or rollback will be safe as described in the above section.
849+
850+
However, if there exist pods with `SupplementalGroupsPolicy` fields when to rollout/rollback, pod creation rejection might happen when
851+
- the feature level of rollout-ed/rollback-ed version is beta or above, and
852+
- pods with `Strict` policy (set when the feature gate was on previously) are scheduled to the nodes whose CRI runtime does NOT support this feature.
853+
854+
In that case, please look for an event saying indicating SupplementalGroupsPolicy is not supported by the node as the rollback signal.
855+
819856
```console
820857
$ kubectl get events -o json -w
821858
...
822859
{
823860
...
824861
"kind": "Event",
825-
"message": "Error: SupplementalGroupsPolicyNotSupported",
862+
"message": "Error: SupplementalGroupsPolicy is not supported in this node.",
826863
...
827864
}
828865
...
829866
```
830867

868+
Also, the following kubelet metrics are also useful to check:
869+
870+
- `kubelet_running_pods`: Shows the actual number of pods running
871+
- `kubelet_desired_pods`: The number of pods the kubelet is trying to run
872+
873+
If these metrics are different, it means there are desired pods that can't be set to running.
874+
If that is the case, checking the pod events to see if they are failing for SupplementalGroupsPolicy reasons
875+
(like the errors shown in above) is advised, in which case it is recommended to rollback.
876+
877+
Even this KEP does NOT include kube-scheduler integration to ensure to let the scheduler place pods requires
878+
the feature(`Strict` policy) to the nodes which support this feature, you can use node labels and
879+
pod's `nodeSelector`/`nodeAffinity` to mitigate pod rejection or error events. Please see
880+
["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature)
881+
section below for details.
882+
831883
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
832884

833885
<!--
@@ -912,6 +964,8 @@ question.
912964
-->
913965

914966
- `supplementalGroupsPolicy=Strict`: 100% of pods were scheduled into a node with the feature supported.
967+
Even this KEP does NOT include scheduler integration, please see
968+
["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) section for this.
915969

916970
- `supplementalGroupsPolicy=Merge`: 100% of pods were scheduled into a node with or without the feature supported.
917971

@@ -937,14 +991,23 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
937991
implementation difficulties, etc.).
938992
-->
939993

940-
Potentially, kube-scheduler could be implemented to avoid scheduling a pod with `supplementalGroupsPolicy: Strict`
941-
to a node running CRI runtime which does not supported this feature.
994+
Potentially, kube-scheduler could implement a rule to avoid scheduling a pod with `supplementalGroupsPolicy: Strict`
995+
to a node not supporting this feature.
942996

943-
In this way, the Event metric described above would not happen, and users would instead see `Pending` pods
944-
as an error metric.
997+
However, this is not covered by this KEP. It is because that more generic way would be nice in Kubernetes so that scheduler can schedule pods which requires node feature X
998+
to the nodes which support node feature X.
945999

946-
However, this is not planned to be implemented in kube-scheduler, as it seems overengineering.
947-
Users may use `nodeSelector`, `nodeAffinity`, etc. to workaround this.
1000+
As of v1.33, although kubernetes does not offer such generic way to do this, cluster admins can maintain node labels and use `nodeSelector`/`nodeAffinity` in pods instead.
1001+
1002+
There are several way to automate them:
1003+
1004+
- By Mutating Webhook:
1005+
- for nodes, which transforms `Node.Status.Feature.SupplementalGroupsPolicy` field to some node label(say `supplementalgroupspolicy-supported: "true" | "false"`),
1006+
- for pods, which mutates an additional `.spec.nodeSelector: { "supplementalgroupspolicy-supported": "true" }` when the pod specifies `Strict` policy.
1007+
- By Mutating Admission Policy:
1008+
- although the feature is still alpha as of v1.32, you can write the equivalent policy to do this.
1009+
1010+
If you appropriately managed the node labels and pods' `nodeSelector`/`nodeAffinity`, the error events or pod rejection will not expect to happen. Instead, you will need to watch `Pending` pods if there are sufficient number of nodes supporting SupplementalGroupsPolicy in the cluster.
9481011

9491012
### Dependencies
9501013

@@ -995,7 +1058,11 @@ This may affect scalability.
9951058

9961059
To evaluate this risk, users may run
9971060
`kubectl get nodes -o json | jq '[.items[].status.features]'`
998-
to see how many nodes support `supplementalGroupsPolicy: true`.
1061+
to see how many nodes support `supplementalGroupsPolicy: true` before using `Strict` policy.
1062+
1063+
To mitigate this probability, you can also manage node labels and pod's `nodeSelector`/`nodeAffinity` to
1064+
ensure pods with `Strict` policy to the nodes which support SupplementalGroupPolicy feature.
1065+
Please see ["Are there any missing metrics that would be useful to have to improve observability of this feature?"](#are-there-any-missing-metrics-that-would-be-useful-to-have-to-improve-observability-of-this-feature) section.
9991066

10001067
###### Will enabling / using this feature result in any new API calls?
10011068

0 commit comments

Comments
 (0)