You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-scheduling/5142-pop-backoffq-when-activeq-empty/README.md
+26-16Lines changed: 26 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -138,39 +138,46 @@ It won't be done again when moving pods from backoffQ to activeQ.
138
138
139
139
### Risks and Mitigations
140
140
141
-
#### Scheduling throughput might be affected
141
+
#### A tiny delay on the first scheduling attempts for newly created pods
142
142
143
-
While popping from backoffQ, another pod might appear in activeQ ready to be scheduled.
144
-
If the pop operation is short enough, there won't be a visible downgrade in throughput.
145
-
The only concern might be that less pods from activeQ might be taken in some period of time in favor of backoffQ,
146
-
but that's a user responsibility to create enough amount of pods to be scheduled from activeQ, not to cause this KEP behavior to happen.
143
+
While the scheduler handles a pod directly popping from backoffQ, another pod that should be scheduled before the pod being scheduled now, may appear in activeQ.
144
+
However, in the real world, if the scheduling latency is short enough, there won't be a visible downgrade in throughput.
145
+
This will only happen if there are no pods in activeQ, so this can be mitigated by an appropriate rate of pod creation.
147
146
148
147
#### Backoff won't be working as natural rate limiter in case of errors
149
148
150
149
In case of API calls errors (e.g. network issues), backoffQ allows to limit number of retries in a short term.
151
150
This proposal will take those pods earlier, leading to losing this mechanism.
152
151
153
152
After merging [kubernetes#128748](github.com/kubernetes/kubernetes/pull/128748),
154
-
it will be possible to distinguish pods backing off because of errors from those backing off because of unschedulable attempt.
155
-
This information could be used when popping, by filtering only the pods that are from unschedulable attempt or even splitting backoffQ.
153
+
it will be possible to distinguish pods backing off because of errors from those backing off because of unschedulable attempt.
154
+
This information could be used when popping, by filtering only the pods that are from unschedulable attempt or even splitting backoffQ.
156
155
157
-
#### One pod in backoffQ could starve the others
156
+
This has to be resolved before the beta is released.
158
157
159
-
If a pod popped from the backoffQ fails its scheduling attempt and come back to the queue, it might be selected again, ahead of other pods.
158
+
#### One pod in backoffQ could starve the others
160
159
161
-
To prevent this, while popping pod from backoffQ, its attempt counter will be incremented as if it had been taken from the activeQ.
162
-
This will give other pods a chance to be scheduled.
160
+
The head of BackoffQ is the pod with the closest backoff expiration,
161
+
and the backoff time is calculated based on the number of scheduling failures that the pod has experienced.
162
+
If one pod has a smaller attempt counter than others,
163
+
could the scheduler keep popping this pod ahead of other pods because the pod's backoff expires faster than others?
164
+
Actually, that wouldn't happen because the scheduler would increment the attempt counter of pods from backoffQ as well,
165
+
which would make the backoff time of pods bigger every after the scheduling attempt,
166
+
and the pod that had a smaller attempt number eventually won't be popped out.
163
167
164
168
## Design Details
165
169
166
170
### Popping from backoffQ in activeQ's pop()
167
171
168
172
To achieve the goal, activeQ's `pop()` method needs to be changed:
169
-
1. If activeQ is empty, then instead of waiting on condition, popping from backoffQ is tried.
170
-
2. If backoffQ is empty, then `pop()` is waiting on condition as previously.
173
+
1. If activeQ is empty, then instead of waiting for a pod to arrive at activeQ, popping from backoffQ is tried.
174
+
2. If backoffQ is empty, then `pop()` is waiting for pod as previously.
171
175
3. If backoffQ is not empty, then the pod is processed like the pod would be taken from activeQ, including increasing attempts number.
172
176
It is poping from a heap data structure, so it should be fast enough not to cause any performance troubles.
173
177
178
+
To support monitoring, when popping from backoffQ,
179
+
the `scheduler_queue_incoming_pods_total` metric with an `activeQ` queue and a new `PopFromBackoffQ` event label will be incremented.
180
+
174
181
### Notifying activeQ condition when new pod appears in backoffQ
175
182
176
183
Pods might appear in backoffQ while `pop()` is hanging on point 2.
@@ -220,6 +227,7 @@ Whole feature should be already covered by integration tests.
220
227
221
228
- Gather feedback from users and fix reported bugs.
222
229
- Change the feature flag to be enabled by default.
230
+
- Make sure [backoff in case of error](#backoff-wont-be-working-as-natural-rate-limiter-in-case-of-errors) is not skipped.
223
231
224
232
#### GA
225
233
@@ -229,7 +237,7 @@ Whole feature should be already covered by integration tests.
229
237
230
238
**Upgrade**
231
239
232
-
During the alpha period, users have to enable the feature gate `PopBackoffQWhenEmptyActiveQ` to opt in this feature.
240
+
During the alpha period, users have to enable the feature gate `SchedulerPopFromBackoffQ` to opt in this feature.
233
241
This is purely in-memory feature for kube-scheduler, so no special actions are required outside the scheduler.
234
242
235
243
**Downgrade**
@@ -247,12 +255,12 @@ This is purely in-memory feature for kube-scheduler, and hence no version skew s
247
255
###### How can this feature be enabled / disabled in a live cluster?
248
256
249
257
-[x] Feature gate (also fill in values in `kep.yaml`)
0 commit comments