You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-[x] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
34
42
-[x] KEP approvers have set the KEP status to `implementable`
35
43
-[x] Design details are appropriately documented
36
-
-[] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
37
-
-[] Graduation criteria is in place
38
-
-[] "Implementation History" section is up-to-date for milestone
44
+
-[x] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
45
+
-[x] Graduation criteria is in place
46
+
-[x] "Implementation History" section is up-to-date for milestone
39
47
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
40
48
-[x] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
41
49
@@ -95,10 +103,9 @@ As a workload author, I want to spread the workload in the cluster, but:
95
103
96
104
### Implementation Details/Notes/Constraints
97
105
98
-
99
106
#### Feature gate
100
107
101
-
Setting a default for `PodTopologySpread`will be guarded with the feature gate
108
+
Setting a default for `PodTopologySpread`is guarded with the feature gate
102
109
`DefaultPodTopologySpread`.
103
110
104
111
#### Relationship with "SelectorSpread" plugin
@@ -138,14 +145,16 @@ Values are decoded from the `pluginConfig` slice in the kube-scheduler Component
138
145
```go
139
146
// pkg/scheduler/apis/config/types_pluginargs.go
140
147
typePodTopologySpreadArgsstruct {
141
-
// DefaultConstraints defines topology spread constraints to be applied to pods
142
-
// that don't define any in `pod.spec.topologySpreadConstraints`. Pod selectors must
143
-
// be empty, as they are deduced from the resources that the pod belongs to
144
-
// (includes services, replication controllers, replica sets and stateful sets).
145
-
// If not specified, the scheduler applies the following default constraints:
// DisableDefaultConstraints allows to disable DefaultConstraints. Defaults to false.
155
+
// When set to true, DefaultConstraints must be empty or nil.
156
+
// +optional
157
+
DisableDefaultConstraintsbool
149
158
}
150
159
```
151
160
@@ -249,6 +258,10 @@ To ensure this feature to be rolled out in high quality. Following tests are man
249
258
- **Integration Tests**: One integration test for the default rules and one for custom rules.
250
259
- **Benchmark Tests**: A benchmark test that compare the default rules against `SelectorSpreadingPriority`.
251
260
The performance should be as close as possible.
261
+
[Beta] There should not be any significant degradation in scheduler performance in clusterloader benchmarks
262
+
for vanilla workloads.
263
+
- **E2E/Conformance Tests**: Test "Multi-AZ Clusters should spread the pods of a {replication controller, service} across zones" should pass.
264
+
This test is currently broken in 5k nodes.
252
265
253
266
### Graduation Criteria
254
267
@@ -259,13 +272,205 @@ To ensure this feature to be rolled out in high quality. Following tests are man
259
272
- [x] Score extension point implementation. Add support for `maxSkew`.
260
273
- [x] Filter extension point implementation.
261
274
- [x] Disabling `SelectorSpread` when the feature is enabled.
262
-
- [x] Unit, Integration and benchmark test cases mentioned in the [Test Plan](#test-plan).
275
+
- [x] Unit and benchmark test cases mentioned in the [Test Plan](#test-plan).
276
+
277
+
#### Beta (v1.20):
278
+
279
+
- [ ] Finalize implementation:
280
+
- [ ] Map `SelectorSpreadingPriority` to `PodTopologySpread` when using Policy API.
281
+
- [ ] Provide knob for disabling the k8s default constraints.
282
+
- [ ] Integration tests.
283
+
- [ ] Verify conformance tests passing.
284
+
285
+
## Production Readiness Review Questionnaire
286
+
287
+
### Feature Enablement and Rollback
288
+
289
+
* **How can this feature be enabled / disabled in a live cluster?**
290
+
- [x] Feature gate (also fill in values in `kep.yaml`)
291
+
- Feature gate name: `DefaultPodTopologySpread`
292
+
- Components depending on the feature gate: `kube-scheduler`
293
+
- [x] Other
294
+
- Describe the mechanism:
295
+
296
+
Explicitly disable default spreading constraints for the `PodTopologySpread` plugin in the kube-scheduler config (passed via `--config` command line flag):
297
+
298
+
```yaml
299
+
apiVersion: kubescheduler.config.k8s.io/v1beta1
300
+
kind: KubeSchedulerConfiguration
301
+
profiles:
302
+
- pluginConfig:
303
+
- name: PodTopologySpread
304
+
args:
305
+
disableDefaultConstraints: true
306
+
```
307
+
308
+
- Will enabling / disabling the feature require downtime of the control
309
+
plane?
310
+
311
+
Only kube-scheduler needs to be restarted.
312
+
313
+
- Will enabling / disabling the feature require downtime or reprovisioning
314
+
of a node? (Do not assume `Dynamic Kubelet Config` feature is enabled).
315
+
316
+
No
317
+
318
+
* **Does enabling the feature change any default behavior?**
319
+
320
+
Yes. Users might experience more spreading of Pods among Nodes and Zones in certain topology distributions.
321
+
In particular, this will be more noticeable in clusters with more than 100 nodes.
322
+
323
+
The [default configuration](#default-constraints) was chosen to produce a behavior that closely resembles
324
+
the `SelectorSpread` plugin.
325
+
See [this PR description](https://github.com/kubernetes/kubernetes/pull/91793) for simulation data.
326
+
327
+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
328
+
the enablement)?**
329
+
330
+
Yes. Once disabled, only scheduling of new Pods will be affected.
331
+
332
+
* **What happens if we reenable the feature if it was previously rolled back?**
333
+
334
+
Only scheduling of new Pods is affected.
335
+
336
+
* **Are there any tests for feature enablement/disablement?**
337
+
338
+
There are unit tests in `pkg/scheduler/algorithmprovider/registry_test.go` that validate the list of default plugins
339
+
of `kube-scheduler` that correspond to the Feature Gate enabled and disabled.
340
+
341
+
### Rollout, Upgrade and Rollback Planning
342
+
343
+
* **How can a rollout fail? Can it impact already running workloads?**
344
+
345
+
Running workloads are not affected by `kube-scheduler`.
346
+
347
+
* **What specific metrics should inform a rollback?**
348
+
349
+
Primarily scheduling latency metrics, such as `framework_extension_point_duration_seconds`, `scheduling_algorithm_duration_seconds`
350
+
and `e2e_scheduling_duration_seconds`, when they have increased significantly.
351
+
352
+
Since spreading is affected, node utilization might change.
353
+
Utilization metrics can be queried in the `/metrics/resource` endpoint exposed by kubelet.
354
+
355
+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
356
+
357
+
N/A.
358
+
359
+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
360
+
fields of API types, flags, etc.?**
361
+
362
+
TBD for GA.
363
+
364
+
### Monitoring Requirements
365
+
366
+
* **How can an operator determine if the feature is in use by workloads?**
367
+
368
+
All Pods are affected, unless they have explicit spreading constraints (.spec.topologySpreadConstraints).
369
+
370
+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
- Components exposing the metric: `kube_scheduler`.
377
+
- [ ] Other (treat as last resort)
378
+
- Details:
379
+
380
+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
381
+
382
+
For 100 nodes, with a 4-core master:
383
+
384
+
- Latency for PreScore+Score less than 60ms for 99% percentile.
385
+
- Latency for PreScore+Score less than 15ms for 95% percentile.
386
+
387
+
* **Are there any missing metrics that would be useful to have to improve observability
388
+
of this feature?**
389
+
390
+
N/A.
391
+
392
+
### Dependencies
393
+
394
+
* **Does this feature depend on any specific services running in the cluster?**
395
+
396
+
N/A.
397
+
398
+
399
+
### Scalability
400
+
401
+
* **Will enabling / using this feature result in any new API calls?**
402
+
403
+
No.
404
+
405
+
* **Will enabling / using this feature result in introducing new API types?**
406
+
407
+
No.
408
+
409
+
* **Will enabling / using this feature result in any new calls to the cloud
410
+
provider?**
411
+
412
+
No.
413
+
414
+
* **Will enabling / using this feature result in increasing size or count of
415
+
the existing API objects?**
416
+
417
+
No.
418
+
419
+
* **Will enabling / using this feature result in increasing time taken by any
420
+
operations covered by [existing SLIs/SLOs]?**
421
+
422
+
Scheduling time on clusters with more than 100 nodes might increase. Smaller clusters are unaffected.
423
+
This is because `SelectorSpreading` doesn't take into account all the Nodes in big clusters when calculating skew,
424
+
resulting in partial spreading at this scale.
425
+
On the contrary, `PodTopologySpreading` considers all nodes when using topologies bigger than a Node, like a Zone.
426
+
427
+
Before graduation, we will ensure that the latency increase is acceptable with Scalability SIG.
428
+
429
+
* **Will enabling / using this feature result in non-negligible increase of
430
+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
431
+
432
+
kube-scheduler might use more CPU to calculate Zone spreading in certain configurations.
433
+
In synthetic benchmarks, the new spreading spends 1.5ms to do PreScore/Score when there are 10k Pods in a 1k Nodes cluster,
434
+
using 16 threads. This is comparable to SelectorSpread.
435
+
436
+
### Troubleshooting
437
+
438
+
* **How does this feature react if the API server and/or etcd is unavailable?**
439
+
440
+
kube-scheduler won't receive Pods
441
+
The effect is no more than it be without the feature.
442
+
443
+
* **What are other known failure modes?**
444
+
445
+
- Pod scheduling is slow
446
+
- Detection: Pod startup time is too high.
447
+
- Diagnostics: Use the `framework_extension_point_duration_seconds` scheduler metric with label `extension_point` values `PreScore` and/or `Score`.
448
+
- Mitigations: Disable the Feature Gate DefaultPodTopologySpreading in kube-scheduler.
449
+
- Testing: There are performance dashboards.
450
+
- Pods of a Service/ReplicaSet/ReplicationController/StatefulSet are not properly spread: spread is either too weak or too strong.
451
+
- Detection: Too many pods belonging to the same Service/ReplicaSet/ReplicationController/StatefulSet are scheduled in a few nodes or
452
+
are spread in too many nodes.
453
+
- Mitigations: Use [Pod Topology spreading](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints)
454
+
in your PodSpecs. Or modify the [default constraints](https://kubernetes.io/docs/concepts/workloads/pods/pod-topology-spread-constraints/#cluster-level-default-constraints)
455
+
for the `PodTopologySpread` plugin to your preference.
456
+
- Diagnostics: N/A
457
+
- Testing: E2E tests ensure that Pods are evenly spread in a clusters with only one Service.
458
+
* **What steps should be taken if SLOs are not being met to determine the problem?**
459
+
460
+
If startup latency is in violation, there is the possibility that it's due to this feature.
461
+
462
+
1. Determine if the scheduler is the culprit: Check for significant latency in `e2e_scheduling_duration_seconds`.
463
+
1. The feature only affects scheduling algorithms, thus you can check for significant latency in `scheduling_algorithm_duration_seconds`.
464
+
1. To check if this feature is the culprit, look for significant latency in `framework_extension_point_duration_seconds`,
465
+
using label `extension_point` with values `PreScore` and `Score`.
466
+
1. Try disabling the Feature Gate `DefaultPodTopologySpreading`.
263
467
264
468
## Implementation History
265
469
266
470
- 2019-09-26: Initial KEP sent out for review.
267
471
- 2020-01-20: KEP updated to make use of framework's PluginConfig.
268
472
- 2020-05-04: Update completed tasks and target alpha for 1.19.
473
+
- 2020-09-21: Add Beta graduation criteria and PRR.
0 commit comments