KEP-3243: Graduate MatchLabelKeys In PodTopologySpread to beta

denkensk · denkensk · commit d805cc3ff367 · 2023-02-02T11:26:21.000+08:00
Signed-off-by: Alex Wang &lt;wangqingcan1990@gmail.com&gt;
diff --git a/keps/prod-readiness/sig-scheduling/3243.yaml b/keps/prod-readiness/sig-scheduling/3243.yaml
@@ -1,3 +1,5 @@
 kep-number: 3243
 alpha:
   approver: "@wojtek-t"
+beta:
+  approver: "@wojtek-t"
diff --git a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md
@@ -131,7 +131,7 @@ checklist items _must_ be updated for the enhancement to be released.
 Items marked with (R) are required *prior to targeting to a milestone / release*.
 
 - [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
-- [ ] (R) KEP approvers have approved the KEP status as `implementable`
+- [x] (R) KEP approvers have approved the KEP status as `implementable`
 - [x] (R) Design details are appropriately documented
 - [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
   - [ ] e2e Tests for all Beta API Operations (endpoints)
@@ -142,7 +142,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
 - [ ] (R) Production readiness review completed
 - [ ] (R) Production readiness review approved
 - [x] "Implementation History" section is up-to-date for milestone
-- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
+- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
 - [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
 
 <!--
@@ -640,7 +640,13 @@ feature gate after having objects written with the new field) are also critical.
 You can take a look at one potential example of such test in:
 https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
 -->
-No, unit and integration tests will be added.
+Yes. there are unit and integration tests for feature enablement/disablement.
+- unit tests:
+  - `pkg/scheduler/framework/plugins/podtopologyspread/filtering_test.go`
+  - `pkg/scheduler/framework/plugins/podtopologyspread/scoring_test.go`
+- integration tests
+  -  `test/integration/scheduler/filters/filters_test.go`
+  -  `test/integration/scheduler/scoring/priorities_test.go`
 
 ### Rollout, Upgrade and Rollback Planning
 
@@ -659,13 +665,18 @@ feature flags will be enabled on some API servers and not others during the
 rollout. Similarly, consider large clusters and how enablement/disablement
 will rollout across nodes.
 -->
+It won't impact already running workloads because it is an opt-in feature.
+
 
 ###### What specific metrics should inform a rollback?
 
 <!--
 What signals should users be paying attention to when the feature is young
 that might indicate a serious problem?
 -->
+- If the metric `schedule_attempts_total{result="error|unschedulable"}` increased significantly after pods using this feature are added.
+- If the metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` increased to higher than 100ms on 90% after pods using this feature are added.  
+
 
 ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
 
@@ -674,12 +685,60 @@ Describe manual testing that was done and the outcomes.
 Longer term, we may want to require automated upgrade/rollback tests, but we
 are missing a bunch of machinery and tooling and can't do that now.
 -->
+Yes, it was tested manually by following the steps below, and it was working at intended.
+1. create a kubernetes cluster v1.26 with 3 nodes where `MatchLabelKeysInPodTopologySpread` feature is disabled.
+2. deploy a deployment with this yaml
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: nginx
+spec:
+  replicas: 12 
+  selector:
+    matchLabels:
+      foo: bar
+  template:
+    metadata:
+      labels:
+        foo: bar
+    spec:
+      restartPolicy: Always
+      containers:
+      - name: nginx
+        image: nginx:1.14.2
+      topologySpreadConstraints:
+        - maxSkew: 1
+          topologyKey: kubernetes.io/hostname
+          whenUnsatisfiable: DoNotSchedule
+          labelSelector:
+            matchLabels:
+              foo: bar
+          matchLabelKeys:
+            - pod-template-hash
+```
+3. pods spread across nodes as 4/4/4
+4. update the deployment nginx image to `nginx:1.15.0`
+5. pods spread across nodes as 5/4/3
+6. delete deployment nginx
+7. upgrade kubenetes cluster to v1.27 (at master branch) while `MatchLabelKeysInPodTopologySpread` is enabled.
+8. deploy a deployment nginx like step2
+9. pods spread across nodes as 4/4/4
+10. update the deployment nginx image to `nginx:1.15.0`
+11. pods spread across nodes as 4/4/4
+12. delete deployment nginx
+13. downgrade kubenetes cluster to v1.26  where `MatchLabelKeysInPodTopologySpread` feature is enabled.
+14. deploy a deployment nginx like step2
+15. pods spread across nodes as 4/4/4
+16. update the deployment nginx image to `nginx:1.15.0`
+17. pods spread across nodes as 4/4/4
 
 ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
 
 <!--
 Even if applying deprecation policies, they may still surprise some users.
 -->
+No.
 
 ### Monitoring Requirements
 
@@ -694,6 +753,7 @@ Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
 checking if there are objects with field X set) may be a last resort. Avoid
 logs or events for this purpose.
 -->
+Operator can query pods that have the `pod.spec.topologySpreadConstraints.matchLabelKeys` field set to determine if the feature is in use by workloads. 
 
 ###### How can someone using this feature know that it is working for their instance?
 
@@ -706,13 +766,8 @@ and operation of this feature.
 Recall that end users cannot usually observe component logs or access metrics.
 -->
 
-- [ ] Events
-  - Event Reason: 
-- [ ] API .status
-  - Condition name: 
-  - Other field: 
-- [ ] Other (treat as last resort)
-  - Details:
+- [x] Other (treat as last resort)
+  - Details: We can determine if the feature is being used by comparing the expected and actual scheduling results.
 
 ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
 
@@ -730,26 +785,27 @@ high level (needs more precise definitions) those may be things like:
 These goals will help you determine what you need to measure (SLIs) in the next
 question.
 -->
+Metric plugin_execution_duration_seconds{plugin="PodTopologySpread"} <= 100ms on 90-percentile.
 
 ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
 
 <!--
 Pick one more of these and delete the rest.
 -->
 
-- [ ] Metrics
-  - Metric name:
-  - [Optional] Aggregation method:
-  - Components exposing the metric:
-- [ ] Other (treat as last resort)
-  - Details:
+- [x] Metrics
+  - Component exposing the metric: kube-scheduler
+    - Metric name: `plugin_execution_duration_seconds{plugin="PodTopologySpread"}`
+    - Metric name: `schedule_attempts_total{result="error|unschedulable"}`
 
 ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
 
 <!--
 Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
 implementation difficulties, etc.).
 -->
+Yes. It's helpful if we have the metrics to see which plugins affect to scheduler's decisions in Filter/Score phase. 
+There is the related issue: https://github.com/kubernetes/kubernetes/issues/110643 . It's very big and still on the way.
 
 ### Dependencies
 
@@ -773,6 +829,7 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
       - Impact of its outage on the feature:
       - Impact of its degraded performance or high-error rates on the feature:
 -->
+No.
 
 ### Scalability
 
@@ -800,6 +857,7 @@ Focusing mostly on:
   - periodic API calls to reconcile state (e.g. periodic fetching state,
     heartbeats, leader election, etc.)
 -->
+No.
 
 ###### Will enabling / using this feature result in introducing new API types?
 
@@ -809,6 +867,7 @@ Describe them, providing:
   - Supported number of objects per cluster
   - Supported number of objects per namespace (for namespace-scoped objects)
 -->
+No.
 
 ###### Will enabling / using this feature result in any new calls to the cloud provider?
 
@@ -817,6 +876,7 @@ Describe them, providing:
   - Which API(s):
   - Estimated increase:
 -->
+No.
 
 ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
 
@@ -826,6 +886,7 @@ Describe them, providing:
   - Estimated increase in size: (e.g., new annotation of size 32B)
   - Estimated amount of new objects: (e.g., new Object X for every existing Pod)
 -->
+No.
 
 ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
 
@@ -837,6 +898,7 @@ Think about adding additional work or introducing new steps in between
 
 [existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
 -->
+No.
 
 ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
 
@@ -849,6 +911,7 @@ This through this both in small and large cases, again with respect to the
 
 [supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
 -->
+No.
 
 ### Troubleshooting
 
@@ -861,6 +924,8 @@ details). For now, we leave it here.
 -->
 
 ###### How does this feature react if the API server and/or etcd is unavailable?
+If the API server and/or etcd is not available, this feature will not be available. 
+This is because the scheduler needs to update the scheduling results to the pod via the API server/etcd.
 
 ###### What are other known failure modes?
 
@@ -876,8 +941,18 @@ For each of them, fill in the following information by copying the below templat
       Not required until feature graduated to beta.
     - Testing: Are there any tests for failure mode? If not, describe why.
 -->
+N/A
 
 ###### What steps should be taken if SLOs are not being met to determine the problem?
+- Check the metric `plugin_execution_duration_seconds{plugin="PodTopologySpread"}` to determine 
+  if the latency increased. If increased, it means this feature may increased scheduling latency. 
+  You can disable the feature `MatchLabelKeysInPodTopologySpread` to see if it's the cause of the 
+  increased latency.
+- Check the metric `schedule_attempts_total{result="error|unschedulable"}` to determine if the number 
+  of attempts increased. If increased, You need to determine the cause of the failure by the event of 
+  the pod. If it's caused by plugin `PodTopologySpread`, You can further analyze this problem by looking 
+  at the scheduler log.
+
 
 ## Implementation History
 
@@ -892,6 +967,8 @@ Major milestones might include:
 - when the KEP was retired or superseded
 -->
  - 2022-03-17: Initial KEP
+ - 2022-06-08: KEP merged
+ - 2023-01-16: Graduate to Beta
 
 ## Drawbacks
 
diff --git a/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml b/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/kep.yaml
@@ -3,7 +3,7 @@ kep-number: 3243
 authors:
   - "@denkensk"
 owning-sig: sig-scheduling
-status: provisional
+status: implementable
 creation-date: 2022-03-17
 reviewers:
   - "@ahg-g"
@@ -17,18 +17,18 @@ see-also:
   - "/keps/sig-scheduling/3094-pod-topology-spread-considering-taints"
 
 # The target maturity stage in the current dev cycle for this KEP.
-stage: alpha
+stage: beta
 
 # The most recent milestone for which work toward delivery of this KEP has been
 # done. This can be the current (upcoming) milestone, if it is being actively
 # worked on.
-latest-milestone: "v1.25"
+latest-milestone: "v1.27"
 
 # The milestone at which this feature was, or is targeted to be, at each stage.
 milestone:
   alpha: "v1.25"
-  beta: "v1.26"
-  stable: "v1.28"
+  beta: "v1.27"
+  stable: "v1.29"
 
 # The following PRR answers are required at alpha release
 # List the feature gate name and the components for which it must be enabled
@@ -39,3 +39,8 @@ feature-gates:
       - kube-scheduler
 
 disable-supported: true
+
+# The following PRR answers are required at beta release
+metrics:
+- plugin_execution_duration_seconds{plugin="PodTopologySpread"}
+- schedule_attempts_total{result="error|unschedulable"}