Skip to content

Commit f2ac64a

Browse files
authored
Merge pull request #3740 from denkensk/match-label-beta
KEP-3243: Graduate MatchLabelKeys In PodTopologySpread to beta
2 parents ac5dae5 + fe27e28 commit f2ac64a

File tree

3 files changed

+104
-21
lines changed

3 files changed

+104
-21
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 3243
22
alpha:
33
approver: "@wojtek-t"
4+
beta:
5+
approver: "@wojtek-t"

keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md

Lines changed: 92 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ checklist items _must_ be updated for the enhancement to be released.
131131
Items marked with (R) are required *prior to targeting to a milestone / release*.
132132

133133
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
134-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
134+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
135135
- [x] (R) Design details are appropriately documented
136136
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
137137
- [ ] e2e Tests for all Beta API Operations (endpoints)
@@ -142,7 +142,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
142142
- [ ] (R) Production readiness review completed
143143
- [ ] (R) Production readiness review approved
144144
- [x] "Implementation History" section is up-to-date for milestone
145-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
145+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
146146
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
147147

148148
<!--
@@ -640,7 +640,7 @@ feature gate after having objects written with the new field) are also critical.
640640
You can take a look at one potential example of such test in:
641641
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
642642
-->
643-
No, unit and integration tests will be added.
643+
No. The unit tests that are exercising the `switch` of feature gate itself will be added.
644644

645645
### Rollout, Upgrade and Rollback Planning
646646

@@ -659,13 +659,22 @@ feature flags will be enabled on some API servers and not others during the
659659
rollout. Similarly, consider large clusters and how enablement/disablement
660660
will rollout across nodes.
661661
-->
662+
It won't impact already running workloads because it is an opt-in feature in scheduler.
663+
But during a rolling upgrade, if some apiservers have not enabled the feature, they will not
664+
be able to accept and store the field "MatchLabelKeys" and the pods associated with these
665+
apiservers will not be able to use this feature. As a result, pods belonging to the
666+
same deployment may have different scheduling outcomes.
667+
662668

663669
###### What specific metrics should inform a rollback?
664670

665671
<!--
666672
What signals should users be paying attention to when the feature is young
667673
that might indicate a serious problem?
668674
-->
675+
- If the metric `schedule_attempts_total{result="error|unschedulable"}` increased significantly after pods using this feature are added.
676+
- If the metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` increased to higher than 100ms on 90% after pods using this feature are added.
677+
669678

670679
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
671680

@@ -674,12 +683,60 @@ Describe manual testing that was done and the outcomes.
674683
Longer term, we may want to require automated upgrade/rollback tests, but we
675684
are missing a bunch of machinery and tooling and can't do that now.
676685
-->
686+
Yes, it was tested manually by following the steps below, and it was working at intended.
687+
1. create a kubernetes cluster v1.26 with 3 nodes where `MatchLabelKeysInPodTopologySpread` feature is disabled.
688+
2. deploy a deployment with this yaml
689+
```yaml
690+
apiVersion: apps/v1
691+
kind: Deployment
692+
metadata:
693+
name: nginx
694+
spec:
695+
replicas: 12
696+
selector:
697+
matchLabels:
698+
foo: bar
699+
template:
700+
metadata:
701+
labels:
702+
foo: bar
703+
spec:
704+
restartPolicy: Always
705+
containers:
706+
- name: nginx
707+
image: nginx:1.14.2
708+
topologySpreadConstraints:
709+
- maxSkew: 1
710+
topologyKey: kubernetes.io/hostname
711+
whenUnsatisfiable: DoNotSchedule
712+
labelSelector:
713+
matchLabels:
714+
foo: bar
715+
matchLabelKeys:
716+
- pod-template-hash
717+
```
718+
3. pods spread across nodes as 4/4/4
719+
4. update the deployment nginx image to `nginx:1.15.0`
720+
5. pods spread across nodes as 5/4/3
721+
6. delete deployment nginx
722+
7. upgrade kubenetes cluster to v1.27 (at master branch) while `MatchLabelKeysInPodTopologySpread` is enabled.
723+
8. deploy a deployment nginx like step2
724+
9. pods spread across nodes as 4/4/4
725+
10. update the deployment nginx image to `nginx:1.15.0`
726+
11. pods spread across nodes as 4/4/4
727+
12. delete deployment nginx
728+
13. downgrade kubenetes cluster to v1.26 where `MatchLabelKeysInPodTopologySpread` feature is enabled.
729+
14. deploy a deployment nginx like step2
730+
15. pods spread across nodes as 4/4/4
731+
16. update the deployment nginx image to `nginx:1.15.0`
732+
17. pods spread across nodes as 4/4/4
677733

678734
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
679735

680736
<!--
681737
Even if applying deprecation policies, they may still surprise some users.
682738
-->
739+
No.
683740

684741
### Monitoring Requirements
685742

@@ -694,6 +751,7 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
694751
checking if there are objects with field X set) may be a last resort. Avoid
695752
logs or events for this purpose.
696753
-->
754+
Operator can query pods that have the `pod.spec.topologySpreadConstraints.matchLabelKeys` field set to determine if the feature is in use by workloads.
697755

698756
###### How can someone using this feature know that it is working for their instance?
699757

@@ -706,13 +764,8 @@ and operation of this feature.
706764
Recall that end users cannot usually observe component logs or access metrics.
707765
-->
708766

709-
- [ ] Events
710-
- Event Reason:
711-
- [ ] API .status
712-
- Condition name:
713-
- Other field:
714-
- [ ] Other (treat as last resort)
715-
- Details:
767+
- [x] Other (treat as last resort)
768+
- Details: We can determine if this feature is being used by checking deployments that have only `MatchLabelKeys` set in `TopologySpreadConstraint` and no `LabelSelector`. These Deployments will strictly adhere to TopologySpread after both deployment and rolling upgrades if the feature is being used.
716769

717770
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
718771

@@ -730,26 +783,27 @@ high level (needs more precise definitions) those may be things like:
730783
These goals will help you determine what you need to measure (SLIs) in the next
731784
question.
732785
-->
786+
Metric plugin_execution_duration_seconds{plugin="PodTopologySpread"} <= 100ms on 90-percentile.
733787

734788
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
735789

736790
<!--
737791
Pick one more of these and delete the rest.
738792
-->
739793

740-
- [ ] Metrics
741-
- Metric name:
742-
- [Optional] Aggregation method:
743-
- Components exposing the metric:
744-
- [ ] Other (treat as last resort)
745-
- Details:
794+
- [x] Metrics
795+
- Component exposing the metric: kube-scheduler
796+
- Metric name: `plugin_execution_duration_seconds{plugin="PodTopologySpread"}`
797+
- Metric name: `schedule_attempts_total{result="error|unschedulable"}`
746798

747799
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
748800

749801
<!--
750802
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
751803
implementation difficulties, etc.).
752804
-->
805+
Yes. It's helpful if we have the metrics to see which plugins affect to scheduler's decisions in Filter/Score phase.
806+
There is the related issue: https://github.com/kubernetes/kubernetes/issues/110643 . It's very big and still on the way.
753807

754808
### Dependencies
755809

@@ -773,6 +827,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
773827
- Impact of its outage on the feature:
774828
- Impact of its degraded performance or high-error rates on the feature:
775829
-->
830+
No.
776831

777832
### Scalability
778833

@@ -800,6 +855,7 @@ Focusing mostly on:
800855
- periodic API calls to reconcile state (e.g. periodic fetching state,
801856
heartbeats, leader election, etc.)
802857
-->
858+
No.
803859

804860
###### Will enabling / using this feature result in introducing new API types?
805861

@@ -809,6 +865,7 @@ Describe them, providing:
809865
- Supported number of objects per cluster
810866
- Supported number of objects per namespace (for namespace-scoped objects)
811867
-->
868+
No.
812869

813870
###### Will enabling / using this feature result in any new calls to the cloud provider?
814871

@@ -817,6 +874,7 @@ Describe them, providing:
817874
- Which API(s):
818875
- Estimated increase:
819876
-->
877+
No.
820878

821879
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
822880

@@ -826,6 +884,7 @@ Describe them, providing:
826884
- Estimated increase in size: (e.g., new annotation of size 32B)
827885
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
828886
-->
887+
No.
829888

830889
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
831890

@@ -837,6 +896,8 @@ Think about adding additional work or introducing new steps in between
837896

838897
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
839898
-->
899+
Yes. there is an additional work: the scheduler will use the keys in `matchLabelKeys` to look up label values from the pod and AND with `LabelSelector`.
900+
Maybe result in a very samll impact in scheduling latency which directly contributes to pod-startup-latency SLO.
840901

841902
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
842903

@@ -849,6 +910,7 @@ This through this both in small and large cases, again with respect to the
849910

850911
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
851912
-->
913+
No.
852914

853915
### Troubleshooting
854916

@@ -861,6 +923,8 @@ details). For now, we leave it here.
861923
-->
862924

863925
###### How does this feature react if the API server and/or etcd is unavailable?
926+
If the API server and/or etcd is not available, this feature will not be available.
927+
This is because the scheduler needs to update the scheduling results to the pod via the API server/etcd.
864928

865929
###### What are other known failure modes?
866930

@@ -876,8 +940,18 @@ For each of them, fill in the following information by copying the below templat
876940
Not required until feature graduated to beta.
877941
- Testing: Are there any tests for failure mode? If not, describe why.
878942
-->
943+
N/A
879944

880945
###### What steps should be taken if SLOs are not being met to determine the problem?
946+
- Check the metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` to determine
947+
if the latency increased. If increased, it means this feature may increased scheduling latency.
948+
You can disable the feature `MatchLabelKeysInPodTopologySpread` to see if it's the cause of the
949+
increased latency.
950+
- Check the metric `schedule_attempts_total{result="error|unschedulable"}` to determine if the number
951+
of attempts increased. If increased, You need to determine the cause of the failure by the event of
952+
the pod. If it's caused by plugin `PodTopologySpread`, You can further analyze this problem by looking
953+
at the scheduler log.
954+
881955

882956
## Implementation History
883957

@@ -892,6 +966,8 @@ Major milestones might include:
892966
- when the KEP was retired or superseded
893967
-->
894968
- 2022-03-17: Initial KEP
969+
- 2022-06-08: KEP merged
970+
- 2023-01-16: Graduate to Beta
895971

896972
## Drawbacks
897973

keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ kep-number: 3243
33
authors:
44
- "@denkensk"
55
owning-sig: sig-scheduling
6-
status: provisional
6+
status: implementable
77
creation-date: 2022-03-17
88
reviewers:
99
- "@ahg-g"
@@ -17,18 +17,18 @@ see-also:
1717
- "/keps/sig-scheduling/3094-pod-topology-spread-considering-taints"
1818

1919
# The target maturity stage in the current dev cycle for this KEP.
20-
stage: alpha
20+
stage: beta
2121

2222
# The most recent milestone for which work toward delivery of this KEP has been
2323
# done. This can be the current (upcoming) milestone, if it is being actively
2424
# worked on.
25-
latest-milestone: "v1.25"
25+
latest-milestone: "v1.27"
2626

2727
# The milestone at which this feature was, or is targeted to be, at each stage.
2828
milestone:
2929
alpha: "v1.25"
30-
beta: "v1.26"
31-
stable: "v1.28"
30+
beta: "v1.27"
31+
stable: "v1.29"
3232

3333
# The following PRR answers are required at alpha release
3434
# List the feature gate name and the components for which it must be enabled
@@ -39,3 +39,8 @@ feature-gates:
3939
- kube-scheduler
4040

4141
disable-supported: true
42+
43+
# The following PRR answers are required at beta release
44+
metrics:
45+
- plugin_execution_duration_seconds{plugin="PodTopologySpread"}
46+
- schedule_attempts_total{result="error|unschedulable"}

0 commit comments

Comments
 (0)