Skip to content

Commit c6e0a76

Browse files
committed
fix1
1 parent 7a89964 commit c6e0a76

File tree

1 file changed

+48
-23
lines changed
  • keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades

1 file changed

+48
-23
lines changed

keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades/README.md

Lines changed: 48 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,8 @@ tags, and then generate with `hack/update-toc.sh`.
109109
- [Implementation History](#implementation-history)
110110
- [Drawbacks](#drawbacks)
111111
- [Alternatives](#alternatives)
112+
- [use pod generateName](#use-pod-generatename)
113+
- [remove MatchLabelKeys implementation from the scheduler plugin](#remove-matchlabelkeys-implementation-from-the-scheduler-plugin)
112114
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
113115
<!-- /toc -->
114116

@@ -179,10 +181,13 @@ which spreading is applied using a LabelSelector. This means the user should
179181
know the exact label key and value when defining the pod spec.
180182

181183
This KEP proposes a complementary field to LabelSelector named `MatchLabelKeys` in
182-
`TopologySpreadConstraint` which represent a set of label keys only. The scheduler
183-
will use those keys to look up label values from the incoming pod; and those key-value
184-
labels are ANDed with `LabelSelector` to identify the group of existing pods over
185-
which the spreading skew will be calculated.
184+
`TopologySpreadConstraint` which represent a set of label keys only.
185+
kube-apiserver will use those keys to look up label values from the incoming pod
186+
and those labels are merged to `LabelSelector`.
187+
kube-scheduler will also look up the label values from the pod and check if those
188+
labels are included in `LabelSelector`. If not, kube-scheduler will take those labels
189+
and AND with `LabelSelector` to identify the group of existing pods over which the
190+
spreading skew will be calculated.
186191

187192
The main case that this new way for identifying pods will enable is constraining
188193
skew spreading calculation to happen at the revision level in Deployments during
@@ -308,6 +313,9 @@ proposal will be implemented, this is the place to discuss them.
308313
-->
309314

310315
A new optional field named `MatchLabelKeys` will be introduced to`TopologySpreadConstraint`.
316+
Currently, when scheduling a pod, the `LabelSelector` defined in the pod is used
317+
to identify the group of pods over which spreading will be calculated.
318+
`MatchLabelKeys` adds another constraint to how this group of pods is identified.
311319
```go
312320
type TopologySpreadConstraint struct {
313321
MaxSkew int32
@@ -365,18 +373,15 @@ kube-apiserver modifies the `labelSelector` like the following:
365373
- app
366374
```
367375

368-
Currently, scheduler will also be aware of `matchLabelKeys` and
369-
gracefully handle the same labels.
370-
This originates from the previous implementation only scheduler
371-
obtains the pod labels by the keys in `matchLabelKeys` and uses
372-
them with `labelSelector` to filter and group pods for spreading.
373-
In the near future, the scheduler implementation related to
374-
`matchLabelKeys` will be removed.
376+
kube-scheduler will also be aware of `matchLabelKeys` and gracefully handle the same labels.
377+
This is for the Cluster-level default constraints by
378+
`matchLabelKeys: ["pod-template-hash"]`.([#129198](https://github.com/kubernetes/kubernetes/issues/129198))
375379

376380
Finally, the feature will be guarded by a new feature flag. If the feature is
377-
disabled, the field `matchLabelKeys` is preserved if it was already set in the
378-
persisted Pod object, otherwise it is silently dropped; moreover, kube-scheduler
379-
will ignore the field and continue to behave as before.
381+
disabled, the field `matchLabelKeys` and corresponding`labelSelector` are preserved
382+
if it was already set in the persisted Pod object, otherwise new Pod with the field
383+
creation will be rejected by kube-apiserver; moreover, kube-scheduler will ignore the
384+
field and continue to behave as before.
380385

381386
### Test Plan
382387

@@ -423,8 +428,9 @@ This can inform certain test coverage improvements that we want to do before
423428
extending the production code to implement this enhancement.
424429
-->
425430

426-
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread`: `06-07` - `86%`
427-
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread/plugin.go`: `06-07` - `73.1%`
431+
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread`: `2025-01-14 JST (The commit hash: ccd2b4e8a719dabe8605b1e6b2e74bb5352696e1)` - `87.5%`
432+
- `k8s.io/kubernetes/pkg/scheduler/framework/plugins/podtopologyspread/plugin.go`: `2025-01-14 JST (The commit hash: ccd2b4e8a719dabe8605b1e6b2e74bb5352696e1)` - `84.8%`
433+
- `k8s.io/kubernetes/pkg/registry/core/pod/strategy.go`: `2025-01-14 JST (The commit hash: ccd2b4e8a719dabe8605b1e6b2e74bb5352696e1)` - `65%`
428434

429435
##### Integration tests
430436

@@ -644,8 +650,11 @@ NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
644650
The feature can be disabled in Alpha and Beta versions by restarting
645651
kube-apiserver and kube-scheduler with feature-gate off.
646652
One caveat is that pods that used the feature will continue to have the
647-
MatchLabelKeys field set even after disabling the feature gate,
648-
however kube-scheduler will not take the field into account.
653+
MatchLabelKeys field set and the corresponding LabelSelector even after
654+
disabling the feature gate, however kube-scheduler will not take the MatchLabelKeys
655+
field into account.
656+
In terms of Stable versions, users can choose to opt-out by not setting
657+
the matchLabelKeys field.
649658

650659
###### What happens if we reenable the feature if it was previously rolled back?
651660
Newly created pods need to follow this policy when scheduling. Old pods will
@@ -684,7 +693,8 @@ feature flags will be enabled on some API servers and not others during the
684693
rollout. Similarly, consider large clusters and how enablement/disablement
685694
will rollout across nodes.
686695
-->
687-
It won't impact already running workloads because it is an opt-in feature in apiserver and scheduler.
696+
It won't impact already running workloads because it is an opt-in feature in kube-apiserver
697+
and kube-scheduler.
688698
But during a rolling upgrade, if some apiservers have not enabled the feature, they will not
689699
be able to accept and store the field "MatchLabelKeys" and the pods associated with these
690700
apiservers will not be able to use this feature. As a result, pods belonging to the
@@ -921,8 +931,14 @@ Think about adding additional work or introducing new steps in between
921931

922932
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
923933
-->
924-
Yes. there is an additional work: the scheduler will use the keys in `matchLabelKeys` to look up label values from the pod and AND with `LabelSelector`.
925-
Maybe result in a very samll impact in scheduling latency which directly contributes to pod-startup-latency SLO.
934+
Yes. there is an additional work:
935+
kube-apiserver uses the keys in `matchLabelKeys` to look up label values from the pod,
936+
and change `LabelSelector` according to them.
937+
kube-scheduler also looks up the label values from the pod and checks if those labels
938+
are included in `LabelSelector`. If not, kube-scheduler will take those labels and AND
939+
with `LabelSelector`.
940+
The impact in the latency of pod creation request in kube-apiserver and kube-scheduler
941+
should be negligible.
926942

927943
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
928944

@@ -962,7 +978,7 @@ details). For now, we leave it here.
962978

963979
###### How does this feature react if the API server and/or etcd is unavailable?
964980
If the API server and/or etcd is not available, this feature will not be available.
965-
This is because the scheduler needs to update the scheduling results to the pod via the API server/etcd.
981+
This is because the kube-scheduler needs to update the scheduling results to the pod via the API server/etcd.
966982

967983
###### What are other known failure modes?
968984

@@ -988,7 +1004,7 @@ N/A
9881004
- Check the metric `schedule_attempts_total{result="error|unschedulable"}` to determine if the number
9891005
of attempts increased. If increased, You need to determine the cause of the failure by the event of
9901006
the pod. If it's caused by plugin `PodTopologySpread`, You can further analyze this problem by looking
991-
at the scheduler log.
1007+
at the kube-scheduler log.
9921008

9931009

9941010
## Implementation History
@@ -1021,11 +1037,20 @@ not need to be as detailed as the proposal, but should include enough
10211037
information to express the idea and why it was not acceptable.
10221038
-->
10231039

1040+
### use pod generateName
10241041
Use `pod.generateName` to distinguish new/old pods that belong to the
10251042
revisions of the same workload in scheduler plugin. It's decided not to
10261043
support because of the following reason: scheduler needs to ensure universal
10271044
and scheduler plugin shouldn't have special treatment for any labels/fields.
10281045

1046+
### remove MatchLabelKeys implementation from the scheduler plugin
1047+
Remove this implementation related to `MatchLabelKeys` from the scheduler plugin
1048+
and only kube-apiserver handles `MatchLabelKeys` and updates `LabelSelector`.
1049+
1050+
However, this idea is rejected because:
1051+
- This approach prevents the achievement of the Cluster-level default constraints by `matchLabelKeys: ["pod-template-hash"]`.([#129198](https://github.com/kubernetes/kubernetes/issues/129198)) because kube-apiserver can't be aware of the kube-scheduler configuration.
1052+
- The current implementation of the scheduler plugin is simple, and the risk of increased maintenance overhead is low.
1053+
10291054
## Infrastructure Needed (Optional)
10301055

10311056
<!--

0 commit comments

Comments
 (0)