Skip to content

Commit d0bc09a

Browse files
committed
fix: address reviews
1 parent eab6081 commit d0bc09a

File tree

1 file changed

+9
-4
lines changed
  • keps/sig-scheduling/4832-async-preemption

1 file changed

+9
-4
lines changed

keps/sig-scheduling/4832-async-preemption/README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ to implement this enhancement.
209209
- `/pkg/scheduler/framework/plugins/defaultpreemption/default_preemption.go`: `2024-09-07` - `85.4`
210210
- `/pkg/scheduler/framework/preemption/preemption.go`: `2024-09-07` - `27.2`
211211

212+
Because the coverage for preemption.go is pretty low, we have to improve the testing there before the change for this KEP.
213+
212214
##### Integration tests
213215

214216
We have to add integration tests to make sure the asynchronous preemption is performed appropriately,
@@ -257,7 +259,7 @@ This is purely internal feature for kube-scheduler, and hence no version skew st
257259

258260
- [x] Feature gate (also fill in values in `kep.yaml`)
259261
- Feature gate name: `SchedulerAsyncPreemption`
260-
- Components depending on the feature gate:
262+
- Components depending on the feature gate: kube-scheduler
261263
- [ ] Other
262264
- Describe the mechanism:
263265
- Will enabling / disabling the feature require downtime of the control
@@ -295,15 +297,17 @@ This section must be completed when targeting beta to a release.
295297

296298
The partly failure in the rollout isn't there because the scheduler is only the component to rollout this feature.
297299
But, if upgrading the scheduler itself fails somehow, new Pods won't be scheduled anymore.
298-
(while Pods, which are already scheduled, won't be affected in any cases.)
300+
If there's a bug in the preemption because of this enhancement, and also downgrading the scheduler fails somehow,
301+
running Pods could be affected, for example, by being deleted by mistake (depending on bugs).
299302

300303
###### What specific metrics should inform a rollback?
301304

302305
Maybe something goes wrong with the preemption if `goroutines_duration_seconds{operation=preemption}` takes too long time.
306+
Also, if `preemption_attempts_total` increases too much, then that might also imply some bugs around the preemption.
303307

304308
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
305309

306-
No. This feature is an internal feature of the scheduler,
310+
No. This feature is an in-memory feature of the scheduler,
307311
and just upgrading it and upgrade->downgrade->upgrade are both the same.
308312

309313
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
@@ -315,7 +319,8 @@ No.
315319
###### How can an operator determine if the feature is in use by workloads?
316320

317321
This feature is used during all Pods' preemption if the feature gate is enabled.
318-
You can find Pods that have experienced the preemption by referring to `.Status.NominatedNodeName`.
322+
You can find Pods that have triggered the preemption by referring to `.Status.NominatedNodeName`,
323+
and Pods that have been preempted by referring to their condition with `type: DisruptionTarget` and `reason: PreemptionByScheduler`.
319324

320325
###### How can someone using this feature know that it is working for their instance?
321326

0 commit comments

Comments
 (0)