You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
However, this KEP also introduces API changes, the tests will be added later, refer to the [PR](https://github.com/kubernetes/kubernetes/pull/112805). I'll update the description once the PR is merged.
575
593
576
594
<!--
577
595
The e2e framework does not currently support enabling or disabling feature
@@ -611,14 +629,87 @@ that might indicate a serious problem?
611
629
- A spike on failure events with keyword "failed spreadConstraint" in scheduler log.
612
630
613
631
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
614
-
No. This will be tested upon beta graduation.
615
632
616
633
<!--
617
634
Describe manual testing that was done and the outcomes.
618
635
Longer term, we may want to require automated upgrade/rollback tests, but we
619
636
are missing a bunch of machinery and tooling and can't do that now.
620
637
-->
621
638
639
+
Not yet, but it will be tested manually prior to upgrade following below steps:
640
+
641
+
1. Install kubernetes v1.24 cluster with two workloads via installation tools like Kind.
642
+
2. Let's name these nodes as node1 and node2, both labelled with key `kubernetes.io/hostname`.
643
+
3. Add a taint to node1 like `foo=bar:NoSchedule`
644
+
4. Apply a deployment like:
645
+
646
+
```yaml
647
+
apiVersion: apps/v1
648
+
kind: Deployment
649
+
metadata:
650
+
name: nginx
651
+
spec:
652
+
replicas: 2
653
+
selector:
654
+
matchLabels:
655
+
foo: bar
656
+
template:
657
+
metadata:
658
+
labels:
659
+
foo: bar
660
+
spec:
661
+
restartPolicy: Always
662
+
containers:
663
+
- name: nginx
664
+
image: nginx:1.14.2
665
+
topologySpreadConstraints:
666
+
- maxSkew: 1
667
+
topologyKey: kubernetes.io/hostname
668
+
whenUnsatisfiable: DoNotSchedule
669
+
labelSelector:
670
+
matchLabels:
671
+
foo: bar
672
+
```
673
+
674
+
5. We'll see one pod pending.
675
+
6. Delete the deployment via `kubectl delete -f`.
676
+
7. Configure the api-server with feature-gate `NodeInclusionPolicyInPodTopologySpread` enabled.
677
+
8. Redeploy the deployment with `NodeTaintsPolicy` honored.
678
+
679
+
```yaml
680
+
apiVersion: apps/v1
681
+
kind: Deployment
682
+
metadata:
683
+
name: nginx
684
+
spec:
685
+
replicas: 2
686
+
selector:
687
+
matchLabels:
688
+
foo: bar
689
+
template:
690
+
metadata:
691
+
labels:
692
+
foo: bar
693
+
spec:
694
+
restartPolicy: Always
695
+
containers:
696
+
- name: nginx
697
+
image: nginx:1.14.2
698
+
topologySpreadConstraints:
699
+
- maxSkew: 1
700
+
topologyKey: kubernetes.io/hostname
701
+
whenUnsatisfiable: DoNotSchedule
702
+
NodeTaintsPolicy: Honor
703
+
labelSelector:
704
+
matchLabels:
705
+
foo: bar
706
+
```
707
+
708
+
9. All pods will be allocated successfully.
709
+
10. Delete the deployment.
710
+
11. Disable the feature gate with api-server restarted.
711
+
12. Apply the deployment for the third time, we'll see one pending again.
712
+
622
713
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
623
714
No
624
715
@@ -661,7 +752,9 @@ Recall that end users cannot usually observe component logs or access metrics.
661
752
- Other field:
662
753
- [ ] Other (treat as last resort)
663
754
- Details: -->
664
-
N/A
755
+
756
+
- [x] Other (treat as last resort)
757
+
- Details: We can only observe the behaviors based on pod scheduling results.
665
758
666
759
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
667
760
@@ -711,7 +804,9 @@ Pick one more of these and delete the rest.
711
804
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
712
805
implementation difficulties, etc.).
713
806
-->
714
-
N/A
807
+
808
+
Yes, we have a plan to improve observability via metrics [here](https://github.com/kubernetes/kubernetes/issues/110643),
809
+
but still on the way.
715
810
716
811
### Dependencies
717
812
@@ -748,7 +843,6 @@ For beta, this section is required: reviewers must answer these questions.
748
843
For GA, this section is required: approvers should be able to confirm the
749
844
previous answers based on experience in the field.
750
845
-->
751
-
No
752
846
753
847
###### Will enabling / using this feature result in any new API calls?
754
848
@@ -831,7 +925,8 @@ details). For now, we leave it here.
831
925
-->
832
926
833
927
###### How does this feature react if the API server and/or etcd is unavailable?
834
-
N/A
928
+
929
+
It only works in pod scheduling, but if the API server or etcd down, pods will not be scheduled successfully.
835
930
836
931
###### What are other known failure modes?
837
932
@@ -851,7 +946,10 @@ For each of them, fill in the following information by copying the below templat
851
946
Configuration errors are logged to stderr.
852
947
853
948
###### What steps should be taken if SLOs are not being met to determine the problem?
854
-
N/A
949
+
950
+
If we see obviously performance degradation or error rate going up with this feature gate enabled,
951
+
we should disable it ASAP, and restart the apiserver. If we have fewer workloads, we can disable the
952
+
policy in `PodTopologySpread` one by one for emergency.
855
953
856
954
## Implementation History
857
955
@@ -868,13 +966,15 @@ Major milestones might include:
868
966
869
967
- 2021.01.12: KEP proposed for review, including motivation, proposal, risks,
870
968
test plan and graduation criteria.
969
+
- 2022.09.22: Graduate to Beta in v1.26.
871
970
872
971
## Drawbacks
873
972
874
973
<!--
875
974
Why should this KEP _not_ be implemented?
876
975
-->
877
-
N/A
976
+
977
+
None, it's a backward compatible feature, if users don't want it, no need to configure anything.
878
978
879
979
## Alternatives
880
980
@@ -896,4 +996,5 @@ Use this section if you need things from the project/SIG. Examples include a
896
996
new subproject, repos requested, or GitHub details. Listing these here allows a
897
997
SIG to get the process for these resources started right away.
0 commit comments