You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
No. The unit tests that are exercising the `switch` of feature gate itself will be added.
644
644
645
645
### Rollout, Upgrade and Rollback Planning
646
646
@@ -659,13 +659,22 @@ feature flags will be enabled on some API servers and not others during the
659
659
rollout. Similarly, consider large clusters and how enablement/disablement
660
660
will rollout across nodes.
661
661
-->
662
+
It won't impact already running workloads because it is an opt-in feature in scheduler.
663
+
But during a rolling upgrade, if some apiservers have not enabled the feature, they will not
664
+
be able to accept and store the field "MatchLabelKeys" and the pods associated with these
665
+
apiservers will not be able to use this feature. As a result, pods belonging to the
666
+
same deployment may have different scheduling outcomes.
667
+
662
668
663
669
###### What specific metrics should inform a rollback?
664
670
665
671
<!--
666
672
What signals should users be paying attention to when the feature is young
667
673
that might indicate a serious problem?
668
674
-->
675
+
- If the metric `schedule_attempts_total{result="error|unschedulable"}` increased significantly after pods using this feature are added.
676
+
- If the metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` increased to higher than 100ms on 90% after pods using this feature are added.
677
+
669
678
670
679
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
671
680
@@ -674,12 +683,60 @@ Describe manual testing that was done and the outcomes.
674
683
Longer term, we may want to require automated upgrade/rollback tests, but we
675
684
are missing a bunch of machinery and tooling and can't do that now.
676
685
-->
686
+
Yes, it was tested manually by following the steps below, and it was working at intended.
687
+
1. create a kubernetes cluster v1.26 with 3 nodes where `MatchLabelKeysInPodTopologySpread` feature is disabled.
688
+
2. deploy a deployment with this yaml
689
+
```yaml
690
+
apiVersion: apps/v1
691
+
kind: Deployment
692
+
metadata:
693
+
name: nginx
694
+
spec:
695
+
replicas: 12
696
+
selector:
697
+
matchLabels:
698
+
foo: bar
699
+
template:
700
+
metadata:
701
+
labels:
702
+
foo: bar
703
+
spec:
704
+
restartPolicy: Always
705
+
containers:
706
+
- name: nginx
707
+
image: nginx:1.14.2
708
+
topologySpreadConstraints:
709
+
- maxSkew: 1
710
+
topologyKey: kubernetes.io/hostname
711
+
whenUnsatisfiable: DoNotSchedule
712
+
labelSelector:
713
+
matchLabels:
714
+
foo: bar
715
+
matchLabelKeys:
716
+
- pod-template-hash
717
+
```
718
+
3. pods spread across nodes as 4/4/4
719
+
4. update the deployment nginx image to `nginx:1.15.0`
720
+
5. pods spread across nodes as 5/4/3
721
+
6. delete deployment nginx
722
+
7. upgrade kubenetes cluster to v1.27 (at master branch) while `MatchLabelKeysInPodTopologySpread` is enabled.
723
+
8. deploy a deployment nginx like step2
724
+
9. pods spread across nodes as 4/4/4
725
+
10. update the deployment nginx image to `nginx:1.15.0`
726
+
11. pods spread across nodes as 4/4/4
727
+
12. delete deployment nginx
728
+
13. downgrade kubenetes cluster to v1.26 where `MatchLabelKeysInPodTopologySpread` feature is enabled.
729
+
14. deploy a deployment nginx like step2
730
+
15. pods spread across nodes as 4/4/4
731
+
16. update the deployment nginx image to `nginx:1.15.0`
732
+
17. pods spread across nodes as 4/4/4
677
733
678
734
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
679
735
680
736
<!--
681
737
Even if applying deprecation policies, they may still surprise some users.
682
738
-->
739
+
No.
683
740
684
741
### Monitoring Requirements
685
742
@@ -694,6 +751,7 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
694
751
checking if there are objects with field X set) may be a last resort. Avoid
695
752
logs or events for this purpose.
696
753
-->
754
+
Operator can query pods that have the `pod.spec.topologySpreadConstraints.matchLabelKeys` field set to determine if the feature is in use by workloads.
697
755
698
756
###### How can someone using this feature know that it is working for their instance?
699
757
@@ -706,13 +764,8 @@ and operation of this feature.
706
764
Recall that end users cannot usually observe component logs or access metrics.
707
765
-->
708
766
709
-
- [ ] Events
710
-
- Event Reason:
711
-
- [ ] API .status
712
-
- Condition name:
713
-
- Other field:
714
-
- [ ] Other (treat as last resort)
715
-
- Details:
767
+
- [x] Other (treat as last resort)
768
+
- Details: We can determine if this feature is being used by checking deployments that have only `MatchLabelKeys` set in `TopologySpreadConstraint` and no `LabelSelector`. These Deployments will strictly adhere to TopologySpread after both deployment and rolling upgrades if the feature is being used.
716
769
717
770
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
718
771
@@ -730,26 +783,27 @@ high level (needs more precise definitions) those may be things like:
730
783
These goals will help you determine what you need to measure (SLIs) in the next
731
784
question.
732
785
-->
786
+
Metric plugin_execution_duration_seconds{plugin="PodTopologySpread"} <= 100ms on 90-percentile.
733
787
734
788
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
Yes. there is an additional work: the scheduler will use the keys in `matchLabelKeys` to look up label values from the pod and AND with `LabelSelector`.
900
+
Maybe result in a very samll impact in scheduling latency which directly contributes to pod-startup-latency SLO.
840
901
841
902
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
842
903
@@ -849,6 +910,7 @@ This through this both in small and large cases, again with respect to the
0 commit comments