Skip to content

Commit 59cdab6

Browse files
committed
fixup: address comments
1 parent bfe7fbc commit 59cdab6

File tree

1 file changed

+22
-11
lines changed
  • keps/sig-scheduling/3521-pod-scheduling-readiness

1 file changed

+22
-11
lines changed

keps/sig-scheduling/3521-pod-scheduling-readiness/README.md

Lines changed: 22 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -521,10 +521,10 @@ The following scenarios need to be covered in integration tests:
521521
can be moved back to activeQ when `.spec.schedulingGates` is all cleared
522522
- Ensure no significant performance degradation
523523

524-
- `test/integration/scheduler/queue_test.go`: Will add new tests.
525-
- `test/integration/scheduler/plugins/plugins_test.go`: Will add new tests.
526-
- `test/integration/scheduler/enqueue/enqueue_test.go`: Will add new tests.
527-
- `test/integration/scheduler_perf/scheduler_perf_test.go`: https://storage.googleapis.com/k8s-triage/index.html?test=BenchmarkPerfScheduling
524+
- `test/integration/scheduler/queue_test.go`: added in Alpha.
525+
- `test/integration/scheduler/plugins/plugins_test.go`: added in Alpha.
526+
- `test/integration/scheduler/enqueue/enqueue_test.go`: added in Alpha.
527+
- `test/integration/scheduler_perf/scheduler_perf_test.go`: will add in Beta. (https://storage.googleapis.com/k8s-triage/index.html?test=BenchmarkPerfScheduling)
528528

529529
##### e2e tests
530530

@@ -538,10 +538,8 @@ https://storage.googleapis.com/k8s-triage/index.html
538538
We expect no non-infra related flakes in the last month as a GA graduation criteria.
539539
-->
540540

541-
Create a test with the following sequences:
541+
An e2e test was created in Alpha with the following sequences:
542542

543-
- Provision a cluster with feature gate `PodSchedulingReadiness=true` (we may need to setup a testgrid
544-
for when it's alpha)
545543
- Create a Pod with non-nil `.spec.schedulingGates`.
546544
- Wait for 15 seconds to ensure (and then verify) it did not get scheduled.
547545
- Clear the Pod's `.spec.schedulingGates` field.
@@ -790,6 +788,12 @@ rollout. Similarly, consider large clusters and how enablement/disablement
790788
will rollout across nodes.
791789
-->
792790

791+
It shouldn't impact already running workloads. It's an opt-in feature, and users need to set
792+
`.spec.schedulingGates` field to use this feature.
793+
794+
When this feature is disabled by the feature flag, the already created Pod's `.spec.schedulingGates`
795+
field is preserved, however, the newly created Pod's `.spec.schedulingGates` field is silently dropped.
796+
793797
###### What specific metrics should inform a rollback?
794798

795799
<!--
@@ -878,7 +882,7 @@ Recall that end users cannot usually observe component logs or access metrics.
878882
- [x] Events
879883
- Event Type: PodScheduled
880884
- Event Status: False
881-
- Event Reason: WaitingForGates SchedulingGated
885+
- Event Reason: SchedulingGated
882886
- Event Message: Scheduling is blocked due to non-empty scheduling gates
883887

884888
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
@@ -1005,7 +1009,10 @@ Describe them, providing:
10051009
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
10061010
-->
10071011

1008-
No to existing API objects that doesn't use this feature.
1012+
- No to existing API objects that doesn't use this feature.
1013+
- For API objects that use this feature:
1014+
- API type: Pod
1015+
- Estimated increase in size: new field `.spec.schedulingGates` about ~64 bytes (in the case of 2 scheduling gates)
10091016

10101017
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
10111018

@@ -1018,7 +1025,7 @@ Think about adding additional work or introducing new steps in between
10181025
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
10191026
-->
10201027

1021-
No.
1028+
This delay should be negligible.
10221029

10231030
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
10241031

@@ -1049,7 +1056,11 @@ details). For now, we leave it here.
10491056

10501057
###### How does this feature react if the API server and/or etcd is unavailable?
10511058

1052-
Update/Patch requests will be rejected.
1059+
During the downtime of API server and/or etcd:
1060+
1061+
- Running workloads that don't need to remove their scheduling gates function well.
1062+
- Running workloads that need to update their scheduling gates will stay in scheduling gated state
1063+
as API requests will be rejected.
10531064

10541065
###### What are other known failure modes?
10551066

0 commit comments

Comments
 (0)