Skip to content

Commit e7ebadc

Browse files
committed
Apply comments
1 parent d2fc2db commit e7ebadc

File tree

2 files changed

+27
-17
lines changed

2 files changed

+27
-17
lines changed

keps/sig-scheduling/5142-pop-backoffq-when-activeq-empty/README.md

Lines changed: 26 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -138,39 +138,46 @@ It won't be done again when moving pods from backoffQ to activeQ.
138138

139139
### Risks and Mitigations
140140

141-
#### Scheduling throughput might be affected
141+
#### A tiny delay on the first scheduling attempts for newly created pods
142142

143-
While popping from backoffQ, another pod might appear in activeQ ready to be scheduled.
144-
If the pop operation is short enough, there won't be a visible downgrade in throughput.
145-
The only concern might be that less pods from activeQ might be taken in some period of time in favor of backoffQ,
146-
but that's a user responsibility to create enough amount of pods to be scheduled from activeQ, not to cause this KEP behavior to happen.
143+
While the scheduler handles a pod directly popping from backoffQ, another pod that should be scheduled before the pod being scheduled now, may appear in activeQ.
144+
However, in the real world, if the scheduling latency is short enough, there won't be a visible downgrade in throughput.
145+
This will only happen if there are no pods in activeQ, so this can be mitigated by an appropriate rate of pod creation.
147146

148147
#### Backoff won't be working as natural rate limiter in case of errors
149148

150149
In case of API calls errors (e.g. network issues), backoffQ allows to limit number of retries in a short term.
151150
This proposal will take those pods earlier, leading to losing this mechanism.
152151

153152
After merging [kubernetes#128748](github.com/kubernetes/kubernetes/pull/128748),
154-
it will be possible to distinguish pods backing off because of errors from those backing off because of unschedulable attempt.
155-
This information could be used when popping, by filtering only the pods that are from unschedulable attempt or even splitting backoffQ.
153+
it will be possible to distinguish pods backing off because of errors from those backing off because of unschedulable attempt.
154+
This information could be used when popping, by filtering only the pods that are from unschedulable attempt or even splitting backoffQ.
156155

157-
#### One pod in backoffQ could starve the others
156+
This has to be resolved before the beta is released.
158157

159-
If a pod popped from the backoffQ fails its scheduling attempt and come back to the queue, it might be selected again, ahead of other pods.
158+
#### One pod in backoffQ could starve the others
160159

161-
To prevent this, while popping pod from backoffQ, its attempt counter will be incremented as if it had been taken from the activeQ.
162-
This will give other pods a chance to be scheduled.
160+
The head of BackoffQ is the pod with the closest backoff expiration,
161+
and the backoff time is calculated based on the number of scheduling failures that the pod has experienced.
162+
If one pod has a smaller attempt counter than others,
163+
could the scheduler keep popping this pod ahead of other pods because the pod's backoff expires faster than others?
164+
Actually, that wouldn't happen because the scheduler would increment the attempt counter of pods from backoffQ as well,
165+
which would make the backoff time of pods bigger every after the scheduling attempt,
166+
and the pod that had a smaller attempt number eventually won't be popped out.
163167

164168
## Design Details
165169

166170
### Popping from backoffQ in activeQ's pop()
167171

168172
To achieve the goal, activeQ's `pop()` method needs to be changed:
169-
1. If activeQ is empty, then instead of waiting on condition, popping from backoffQ is tried.
170-
2. If backoffQ is empty, then `pop()` is waiting on condition as previously.
173+
1. If activeQ is empty, then instead of waiting for a pod to arrive at activeQ, popping from backoffQ is tried.
174+
2. If backoffQ is empty, then `pop()` is waiting for pod as previously.
171175
3. If backoffQ is not empty, then the pod is processed like the pod would be taken from activeQ, including increasing attempts number.
172176
It is poping from a heap data structure, so it should be fast enough not to cause any performance troubles.
173177

178+
To support monitoring, when popping from backoffQ,
179+
the `scheduler_queue_incoming_pods_total` metric with an `activeQ` queue and a new `PopFromBackoffQ` event label will be incremented.
180+
174181
### Notifying activeQ condition when new pod appears in backoffQ
175182

176183
Pods might appear in backoffQ while `pop()` is hanging on point 2.
@@ -220,6 +227,7 @@ Whole feature should be already covered by integration tests.
220227

221228
- Gather feedback from users and fix reported bugs.
222229
- Change the feature flag to be enabled by default.
230+
- Make sure [backoff in case of error](#backoff-wont-be-working-as-natural-rate-limiter-in-case-of-errors) is not skipped.
223231

224232
#### GA
225233

@@ -229,7 +237,7 @@ Whole feature should be already covered by integration tests.
229237

230238
**Upgrade**
231239

232-
During the alpha period, users have to enable the feature gate `PopBackoffQWhenEmptyActiveQ` to opt in this feature.
240+
During the alpha period, users have to enable the feature gate `SchedulerPopFromBackoffQ` to opt in this feature.
233241
This is purely in-memory feature for kube-scheduler, so no special actions are required outside the scheduler.
234242

235243
**Downgrade**
@@ -247,12 +255,12 @@ This is purely in-memory feature for kube-scheduler, and hence no version skew s
247255
###### How can this feature be enabled / disabled in a live cluster?
248256

249257
- [x] Feature gate (also fill in values in `kep.yaml`)
250-
- Feature gate name: `PopBackoffQWhenEmptyActiveQ`
258+
- Feature gate name: `SchedulerPopFromBackoffQ`
251259
- Components depending on the feature gate: kube-scheduler
252260

253261
###### Does enabling the feature change any default behavior?
254262

255-
Pods that are backoffQ might be scheduled earlier when activeQ is empty.
263+
Pods that are backing off might be scheduled earlier when activeQ is empty.
256264

257265
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
258266

@@ -287,6 +295,7 @@ Abnormal values of metrics related to scheduling queue, meaning pods are stuck i
287295
- `scheduler_schedule_attempts_total` metric with `scheduled` label is almost constant, while there are pending pods that should be schedulable.
288296
This could mean that pods from backoffQ are taken instead of those from activeQ.
289297
- `scheduler_pending_pods` metric with `active` label is not decreasing, while with `backoff` is almost constant.
298+
- `scheduler_queue_incoming_pods_total` metric with `BackoffPop` label is increasing when there are pods in activeQ.
290299
- `scheduler_pod_scheduling_sli_duration_seconds` metric is visibly higher for schedulable pods.
291300

292301
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
@@ -304,6 +313,7 @@ No
304313
###### How can an operator determine if the feature is in use by workloads?
305314

306315
This feature is used during scheduling when activeQ is empty and if the feature gate is enabled.
316+
Also, `scheduler_queue_incoming_pods_total` could be checked, by querying for new `PopFromBackoffQ` event label.
307317

308318
###### How can someone using this feature know that it is working for their instance?
309319

keps/sig-scheduling/5142-pop-backoffq-when-activeq-empty/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ milestone:
2121
# The following PRR answers are required at alpha release
2222
# List the feature gate name and the components for which it must be enabled
2323
feature-gates:
24-
- name: PopBackoffQWhenEmptyActiveQ
24+
- name: SchedulerPopFromBackoffQ
2525
components:
2626
- kube-scheduler
2727
disable-supported: true

0 commit comments

Comments
 (0)