You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -164,15 +167,19 @@ Ensure existing tests (for preempting PriorityClasses) do not break.
164
167
### Graduation Criteria
165
168
#### Alpha (v1.15):
166
169
167
-
-[x]Support NonPreemptingPriority in PriorityClasses
170
+
- Support NonPreemptingPriority in PriorityClasses
168
171
169
172
#### Beta (v1.19):
170
173
171
-
-[ ] Add integration test for NonPreemptingPriority.
172
-
-[ ] Graduate NonPreemptingPriority to Beta.
173
-
-[ ] Update documents to reflect the changes.
174
-
174
+
- Add integration test for NonPreemptingPriority.
175
+
- Graduate NonPreemptingPriority to Beta.
176
+
- Update documents to reflect the changes.
175
177
178
+
#### Stable (v1.24):
179
+
- No negative feedback.
180
+
- Enhance the message of the existing event for scheduling failed to include details about preemption.
181
+
- Graduate NonPreemptingPriority to GA.
182
+
- Update documents to reflect the changes.
176
183
177
184
## Production Readiness Review Questionnaire
178
185
@@ -189,7 +196,7 @@ Ensure existing tests (for preempting PriorityClasses) do not break.
189
196
190
197
***Can the feature be disabled once it has been enabled (i.e. can we rollback
191
198
the enablement)?**
192
-
Yes, the feature can be disabled if the PreemptionPolicy isn't set.
199
+
Yes. This feature can be disabled by restarting kube-apiserver and kube-scheduler with feature-gate turned off.
193
200
194
201
***What happens if we reenable the feature if it was previously rolled back?**
195
202
If we reenable the feature, the Pod with high priority and NonPreemptionPolicy will be eligible to preempt other pods with low priority when cluster resources are tight.
@@ -199,18 +206,126 @@ Ensure existing tests (for preempting PriorityClasses) do not break.
199
206
200
207
### Rollout, Upgrade and Rollback Planning
201
208
***How can a rollout fail? Can it impact already running workloads?**
202
-
The scheduler errors and exits during start up. Existing workloads are not
203
-
affected.
209
+
If a rollout fails, kube-scheduler will keep crashing. Running workloads won't be affected by kube-scheduler.
204
210
205
211
***What specific metrics should inform a rollback?**
206
-
N/A.
212
+
Check the following indicators to determine if there are any exceptions:
***Were upgrade and rollback tested? Was upgrade->downgrade->upgrade path tested?**
209
-
N/A.
218
+
Manually tested successfully. The test environment version is v1.23. We tested enabling and disabling this
219
+
feature. After each change in the feature-gate, 3 separate priorityclasses will be recreated (One
220
+
high-priorityclass with preemptionPolicy as Never, other high-priorityclass with preemptionPolicy not be
221
+
set, one low-priorityclass with preemptionPolicy not be set). Create multiple pods with the above 3
222
+
priorityclasses to verify that the preemption results are as expected.
210
223
211
224
***Is the rollout accompanied by any deprecations and/or removals of features?**
212
225
N/A.
213
226
227
+
### Monitoring Requirements
228
+
229
+
<!--
230
+
This section must be completed when targeting beta to a release.
231
+
-->
232
+
233
+
###### How can an operator determine if the feature is in use by workloads?
234
+
The operator can determine if the workload is using the feature by checking if the priorityclass's preemptionPolicy is set to "Never".
235
+
<!--
236
+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
237
+
checking if there are objects with field X set) may be a last resort. Avoid
238
+
logs or events for this purpose.
239
+
-->
240
+
241
+
###### How can someone using this feature know that it is working for their instance?
242
+
243
+
<!--
244
+
For instance, if this is a pod-related feature, it should be possible to determine if the feature is functioning properly
245
+
for each individual pod.
246
+
Pick one more of these and delete the rest.
247
+
Please describe all items visible to end users below with sufficient detail so that they can verify correct enablement
248
+
and operation of this feature.
249
+
Recall that end users cannot usually observe component logs or access metrics.
250
+
-->
251
+
252
+
-[x] Events
253
+
- Event Reason: There is an event sent by kube-scheduler if the pod preempts other pods. If the feature is working and the pod with the priorityclass'preemptionPolicy as Never, there won't be a preemption related event for this pod.
254
+
-[ ] API .status
255
+
- Condition name:
256
+
- Other field:
257
+
-[x] Other (treat as last resort)
258
+
- Details: Check if pods with preemptionPolicy set to Never can preempt other low-priority pods when the cluster resources cannot be met.
259
+
260
+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
261
+
N/A
262
+
263
+
<!--
264
+
This is your opportunity to define what "normal" quality of service looks like
265
+
for a feature.
266
+
267
+
It's impossible to provide comprehensive guidance, but at the very
268
+
high level (needs more precise definitions) those may be things like:
269
+
- per-day percentage of API calls finishing with 5XX errors <= 1%
270
+
- 99% percentile over day of absolute value from (job creation time minus expected
271
+
job creation time) for cron job <= 10%
272
+
- 99.9% of /health requests per day finish with 200 code
273
+
274
+
These goals will help you determine what you need to measure (SLIs) in the next
275
+
question.
276
+
-->
277
+
278
+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
279
+
280
+
<!--
281
+
Pick one more of these and delete the rest.
282
+
-->
283
+
284
+
-[x] Metrics
285
+
- Metric name: preemption_victims
286
+
-[Optional] Aggregation method:
287
+
- Components exposing the metric: kube-scheduler
288
+
-[ ] Other (treat as last resort)
289
+
- Details:
290
+
291
+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
292
+
We currently only have events that describe a pod being preempted by another pod. But we don't
293
+
have an event that describes why sometimes the preemption is not successful. We can enhance the
294
+
message of the existing event for scheduling failed to include details about preemption. This
295
+
will help us to improve observability for this feature and other scenarios.
296
+
297
+
In addition to events, we can add metrics about how many pods have stopped preempting other pods because of this no-preemption option. However, since the probability of this metric being used is likely to be small, it was not added.
298
+
299
+
<!--
300
+
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
301
+
implementation difficulties, etc.).
302
+
-->
303
+
304
+
305
+
### Dependencies
306
+
307
+
<!--
308
+
This section must be completed when targeting beta to a release.
309
+
-->
310
+
311
+
###### Does this feature depend on any specific services running in the cluster?
312
+
No.
313
+
314
+
<!--
315
+
Think about both cluster-level services (e.g. metrics-server) as well
316
+
as node-level agents (e.g. specific version of CRI). Focus on external or
317
+
optional services that are needed. For example, if this feature depends on
318
+
a cloud provider API, or upon an external software-defined storage or network
319
+
control plane.
320
+
321
+
For each of these, fill in the following—thinking about running existing user workloads
322
+
and creating new ones, as well as about cluster-level services (e.g. DNS):
323
+
- [Dependency name]
324
+
- Usage description:
325
+
- Impact of its outage on the feature:
326
+
- Impact of its degraded performance or high-error rates on the feature:
327
+
-->
328
+
214
329
### Scalability
215
330
***Will enabling / using this feature result in any new API calls?**
216
331
No
@@ -249,9 +364,6 @@ Ensure existing tests (for preempting PriorityClasses) do not break.
Pod Priority and Preemption are tracked as part of [enhancement#564](https://github.com/kubernetes/enhancements/issues/564).
256
-
The proposal for Pod Priority can be [found here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-priority-api.md)
257
-
and Preemption proposal is [here](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/scheduling/pod-preemption.md).
0 commit comments