Skip to content

Commit eb13a25

Browse files
authored
KEP-3094: Graduate NodeInclusionPolicy to Beta in v1.26 (kubernetes#3539)
* Graduate NodeInclusionPolicy to Beta Signed-off-by: kerthcet <[email protected]> * update PRR files Signed-off-by: kerthcet <[email protected]> * update the stage to beta Signed-off-by: kerthcet <[email protected]> * address comments Signed-off-by: kerthcet <[email protected]> * address comments Signed-off-by: kerthcet <[email protected]> * Make sure feature gate enable/disable tested Signed-off-by: kerthcet <[email protected]> * add feature gate enabled/disabled testcases Signed-off-by: kerthcet <[email protected]> * Add descriptions about feature enable/disable tests Signed-off-by: kerthcet <[email protected]> Signed-off-by: kerthcet <[email protected]>
1 parent 1264d3d commit eb13a25

File tree

3 files changed

+116
-14
lines changed

3 files changed

+116
-14
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3094
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-scheduling/3094-pod-topology-spread-considering-taints/README.md

Lines changed: 113 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -494,7 +494,6 @@ enhancement:
494494
- Previously configured values will be ignored.
495495

496496
### Version Skew Strategy
497-
N/A
498497

499498
<!--
500499
If applicable, how will the component handle version skew with other
@@ -509,6 +508,8 @@ enhancement:
509508
CRI or CNI may require updating that component before the kubelet.
510509
-->
511510

511+
Kube-scheduler generally has the same version as api-server. So no version skew strategy.
512+
512513
## Production Readiness Review Questionnaire
513514

514515
<!--
@@ -555,7 +556,7 @@ Pick one of these and delete the rest.
555556
Any change of default behavior may be surprising to users or break existing
556557
automations, so be extremely careful here.
557558
-->
558-
No.
559+
No, it's backwards compatible.
559560

560561
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
561562

@@ -565,13 +566,30 @@ feature, can it break the existing applications?).
565566
566567
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
567568
-->
568-
Yes.
569+
Yes, we can just disable the feature gate.
569570

570571
###### What happens if we reenable the feature if it was previously rolled back?
571572
The policies are respected again.
572573

573574
###### Are there any tests for feature enablement/disablement?
574-
No, appropriate unit tests will be added for Alpha.
575+
576+
In the scheduler, this is in-memory feature, so tests checking both feature enabled or disabled were added:
577+
578+
Unit tests:
579+
580+
- pkg/api/pod/util_test.go#TestDropNodeInclusionPolicyFields
581+
- pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go#TestPreFilterState
582+
- pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go#TestSingleConstraint
583+
- pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go#TestMultipleConstraints
584+
- pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go#TestPreScoreStateEmptyNodes
585+
- pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go#TestPodTopologySpreadScore
586+
587+
Integration tests:
588+
589+
- test/integration/scheduler/filters/filters_test.go#TestPodTopologySpreadFilter
590+
- test/integration/scheduler/scoring/priorities_test.go#TestPodTopologySpreadScoring
591+
592+
However, this KEP also introduces API changes, the tests will be added later, refer to the [PR](https://github.com/kubernetes/kubernetes/pull/112805). I'll update the description once the PR is merged.
575593

576594
<!--
577595
The e2e framework does not currently support enabling or disabling feature
@@ -611,14 +629,87 @@ that might indicate a serious problem?
611629
- A spike on failure events with keyword "failed spreadConstraint" in scheduler log.
612630

613631
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
614-
No. This will be tested upon beta graduation.
615632

616633
<!--
617634
Describe manual testing that was done and the outcomes.
618635
Longer term, we may want to require automated upgrade/rollback tests, but we
619636
are missing a bunch of machinery and tooling and can't do that now.
620637
-->
621638

639+
Not yet, but it will be tested manually prior to upgrade following below steps:
640+
641+
1. Install kubernetes v1.24 cluster with two workloads via installation tools like Kind.
642+
2. Let's name these nodes as node1 and node2, both labelled with key `kubernetes.io/hostname`.
643+
3. Add a taint to node1 like `foo=bar:NoSchedule`
644+
4. Apply a deployment like:
645+
646+
```yaml
647+
apiVersion: apps/v1
648+
kind: Deployment
649+
metadata:
650+
name: nginx
651+
spec:
652+
replicas: 2
653+
selector:
654+
matchLabels:
655+
foo: bar
656+
template:
657+
metadata:
658+
labels:
659+
foo: bar
660+
spec:
661+
restartPolicy: Always
662+
containers:
663+
- name: nginx
664+
image: nginx:1.14.2
665+
topologySpreadConstraints:
666+
- maxSkew: 1
667+
topologyKey: kubernetes.io/hostname
668+
whenUnsatisfiable: DoNotSchedule
669+
labelSelector:
670+
matchLabels:
671+
foo: bar
672+
```
673+
674+
5. We'll see one pod pending.
675+
6. Delete the deployment via `kubectl delete -f`.
676+
7. Configure the api-server with feature-gate `NodeInclusionPolicyInPodTopologySpread` enabled.
677+
8. Redeploy the deployment with `NodeTaintsPolicy` honored.
678+
679+
```yaml
680+
apiVersion: apps/v1
681+
kind: Deployment
682+
metadata:
683+
name: nginx
684+
spec:
685+
replicas: 2
686+
selector:
687+
matchLabels:
688+
foo: bar
689+
template:
690+
metadata:
691+
labels:
692+
foo: bar
693+
spec:
694+
restartPolicy: Always
695+
containers:
696+
- name: nginx
697+
image: nginx:1.14.2
698+
topologySpreadConstraints:
699+
- maxSkew: 1
700+
topologyKey: kubernetes.io/hostname
701+
whenUnsatisfiable: DoNotSchedule
702+
NodeTaintsPolicy: Honor
703+
labelSelector:
704+
matchLabels:
705+
foo: bar
706+
```
707+
708+
9. All pods will be allocated successfully.
709+
10. Delete the deployment.
710+
11. Disable the feature gate with api-server restarted.
711+
12. Apply the deployment for the third time, we'll see one pending again.
712+
622713
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
623714
No
624715

@@ -661,7 +752,9 @@ Recall that end users cannot usually observe component logs or access metrics.
661752
- Other field:
662753
- [ ] Other (treat as last resort)
663754
- Details: -->
664-
N/A
755+
756+
- [x] Other (treat as last resort)
757+
- Details: We can only observe the behaviors based on pod scheduling results.
665758

666759
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
667760

@@ -711,7 +804,9 @@ Pick one more of these and delete the rest.
711804
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
712805
implementation difficulties, etc.).
713806
-->
714-
N/A
807+
808+
Yes, we have a plan to improve observability via metrics [here](https://github.com/kubernetes/kubernetes/issues/110643),
809+
but still on the way.
715810

716811
### Dependencies
717812

@@ -748,7 +843,6 @@ For beta, this section is required: reviewers must answer these questions.
748843
For GA, this section is required: approvers should be able to confirm the
749844
previous answers based on experience in the field.
750845
-->
751-
No
752846

753847
###### Will enabling / using this feature result in any new API calls?
754848

@@ -831,7 +925,8 @@ details). For now, we leave it here.
831925
-->
832926

833927
###### How does this feature react if the API server and/or etcd is unavailable?
834-
N/A
928+
929+
It only works in pod scheduling, but if the API server or etcd down, pods will not be scheduled successfully.
835930

836931
###### What are other known failure modes?
837932

@@ -851,7 +946,10 @@ For each of them, fill in the following information by copying the below templat
851946
Configuration errors are logged to stderr.
852947

853948
###### What steps should be taken if SLOs are not being met to determine the problem?
854-
N/A
949+
950+
If we see obviously performance degradation or error rate going up with this feature gate enabled,
951+
we should disable it ASAP, and restart the apiserver. If we have fewer workloads, we can disable the
952+
policy in `PodTopologySpread` one by one for emergency.
855953

856954
## Implementation History
857955

@@ -868,13 +966,15 @@ Major milestones might include:
868966

869967
- 2021.01.12: KEP proposed for review, including motivation, proposal, risks,
870968
test plan and graduation criteria.
969+
- 2022.09.22: Graduate to Beta in v1.26.
871970

872971
## Drawbacks
873972

874973
<!--
875974
Why should this KEP _not_ be implemented?
876975
-->
877-
N/A
976+
977+
None, it's a backward compatible feature, if users don't want it, no need to configure anything.
878978

879979
## Alternatives
880980

@@ -896,4 +996,5 @@ Use this section if you need things from the project/SIG. Examples include a
896996
new subproject, repos requested, or GitHub details. Listing these here allows a
897997
SIG to get the process for these resources started right away.
898998
-->
899-
N/A
999+
1000+
No

keps/sig-scheduling/3094-pod-topology-spread-considering-taints/kep.yaml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ kep-number: 3094
33
authors:
44
- "@kerthcet"
55
owning-sig: sig-scheduling
6-
participating-sigs:
76
status: implementable
87
creation-date: 2021-12-30
98
reviewers:
@@ -18,7 +17,7 @@ see-also:
1817
- "/keps/sig-scheduling/1258-default-pod-topology-spread"
1918

2019
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: alpha
20+
stage: beta
2221

2322
# The most recent milestone for which work toward delivery of this KEP has been
2423
# done. This can be the current (upcoming) milestone, if it is being actively

0 commit comments

Comments
 (0)