@@ -208,8 +208,8 @@ Beta (v1.22):
208
208
- Enable LogarithmicScaleDown feature gate by default
209
209
- Enable ` sorting_deletion_age_ratio ` metric
210
210
211
- Stable (v1.23 ):
212
- - Remove LogarithmicScaleDown feature gate
211
+ Stable (v1.31 ):
212
+ - Lock LogarithmicScaleDown feature gate to true
213
213
- Make this behavior standard
214
214
215
215
### Upgrade / Downgrade Strategy
@@ -230,9 +230,7 @@ behavior reduces the risk that it is an expectation from other components.
230
230
231
231
### Feature Enablement and Rollback
232
232
233
- _ This section must be completed when targeting alpha to a release._
234
-
235
- * ** How can this feature be enabled / disabled in a live cluster?**
233
+ ###### How can this feature be enabled / disabled in a live cluster?
236
234
- [x] Feature gate (also fill in values in ` kep.yaml ` )
237
235
- Feature gate name: LogarithmicScaleDown
238
236
- Components depending on the feature gate: kube-controller-manager
@@ -243,53 +241,58 @@ _This section must be completed when targeting alpha to a release._
243
241
- Will enabling / disabling the feature require downtime or reprovisioning
244
242
of a node?
245
243
246
- * ** Does enabling the feature change any default behavior?**
244
+ ###### Does enabling the feature change any default behavior?
247
245
Yes, this changes the default assumption that the youngest pod in a replica set
248
246
will always be the one evicted. However, it still groups pods by their age and picks
249
247
from the youngest group.
250
248
251
- * ** Can the feature be disabled once it has been enabled (i.e. can we roll back
252
- the enablement)?**
249
+ ###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
253
250
Yes. Existing workloads should see no change when disabling this feature.
254
251
255
- * ** What happens if we reenable the feature if it was previously rolled back?**
252
+ ###### What happens if we reenable the feature if it was previously rolled back?
256
253
Assumptions that the newest pod will be deleted first may break.
257
254
258
- * ** Are there any tests for feature enablement/disablement?**
255
+ ###### Are there any tests for feature enablement/disablement?
259
256
Tests for feature disablement shouldn't be necessary, as this is already an assumed
260
257
(but not documented) controller behavior.
261
258
262
259
### Rollout, Upgrade and Rollback Planning
263
260
264
- _ This section must be completed when targeting beta graduation to a release._
265
-
266
- * ** How can a rollout fail? Can it impact already running workloads?**
261
+ ###### How can a rollout or rollback fail? Can it impact already running workloads?
267
262
This should not affect running workloads, though there is the possibility that the logic
268
263
panics which would cause kube-controller-manager to crash
269
264
270
- * ** What specific metrics should inform a rollback?**
265
+ ###### What specific metrics should inform a rollback?
271
266
Increased pod deletions could indicate runaway/hot-loop failures in the scaledown logic.
272
267
Availability of applications may also be affected. Though the intent of this is to provide
273
268
better available through more distributed victim selection, in cases of desired binpacking
274
269
pods may remain running on undesired nodes.
275
270
276
- * ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
277
- This will be manually tested before the graduation to beta
271
+ ###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
272
+ This is purely in-memory change for the controller, so upgrade/downgrade doesn't really change anything.
278
273
279
- * ** Is the rollout accompanied by any deprecations and/or removals of features, APIs,
280
- fields of API types, flags, etc.?**
274
+ ###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
281
275
No
282
276
283
277
### Monitoring Requirements
284
278
285
- _ This section must be completed when targeting beta graduation to a release._
286
-
287
- * ** How can an operator determine if the feature is in use by workloads?**
288
- The scaledown behavior of all replicasets will be affected by this featuregate being
289
- enabled, so somehow monitoring them will be necessary to determine it
290
-
291
- * ** What are the SLIs (Service Level Indicators) an operator can use to determine
292
- the health of the service?**
279
+ ###### How can an operator determine if the feature is in use by workloads?
280
+ The feature is global, so it's always going to be used on any downscale.
281
+
282
+ ###### How can someone using this feature know that it is working for their instance?
283
+ - [ ] Events
284
+ - Event Reason:
285
+ - [ ] API .status
286
+ - Condition name:
287
+ - Other field:
288
+ - [x] Other (treat as last resort)
289
+ - Details:
290
+ A ReplicaSet with two ready pods whose Pod Cost annotation is not set,
291
+ if the logarithmic values of the pod ready times are identical,
292
+ the pod with the smaller UID will be downscaled first rather than
293
+ the latest ready one
294
+
295
+ ###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
293
296
- [x] Metrics
294
297
- Metric name: sorting_deletion_age_ratio
295
298
- [ Optional] Aggregation method:
@@ -302,71 +305,52 @@ algorithm falls back to age. (Pod age is the final criteria in the sorting algor
302
305
want to measure this ratio for deletions which don't use this feature, as those may validly fall
303
306
outside the desired range).
304
307
305
- * ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs? **
308
+ ###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
306
309
There should be no values ` >2 ` in the above metric when the Pod Cost annotation is unset
307
310
(see https://github.com/kubernetes/enhancements/tree/master/keps/sig-apps/2255-pod-cost ) and
308
311
the pod's deletion was based on a timestamp comparison (rather than, for example, pod state).
309
312
310
- * ** Are there any missing metrics that would be useful to have to improve observability
311
- of this feature?**
312
- Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
313
- implementation difficulties, etc.).
313
+ ###### Are there any missing metrics that would be useful to have to improve observability of this feature?
314
+ No, we didn't find any other gaps that could be covered by metrics.
314
315
315
316
### Dependencies
316
317
317
- _ This section must be completed when targeting beta graduation to a release._
318
-
319
- * ** Does this feature depend on any specific services running in the cluster?**
318
+ ###### Does this feature depend on any specific services running in the cluster?
320
319
No, it is part of the controller-manager
321
320
322
321
### Scalability
323
322
324
- _ For alpha, this section is encouraged: reviewers should consider these questions
325
- and attempt to answer them._
326
-
327
- _ For beta, this section is required: reviewers must answer these questions._
328
-
329
- _ For GA, this section is required: approvers should be able to confirm the
330
- previous answers based on experience in the field._
331
-
332
- * ** Will enabling / using this feature result in any new API calls?**
323
+ ###### Will enabling / using this feature result in any new API calls?
333
324
No
334
325
335
- * ** Will enabling / using this feature result in introducing new API types?**
326
+ ###### Will enabling / using this feature result in introducing new API types?
336
327
No
337
328
338
- * ** Will enabling / using this feature result in any new calls to the cloud
339
- provider?**
329
+ ###### Will enabling / using this feature result in any new calls to the cloud provider?
340
330
No
341
331
342
- * ** Will enabling / using this feature result in increasing size or count of
343
- the existing API objects?**
332
+ ###### Will enabling / using this feature result in increasing size or count of the existing API objects?
344
333
No
345
334
346
- * ** Will enabling / using this feature result in increasing time taken by any
347
- operations covered by [ existing SLIs/SLOs] ?**
335
+ ###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
348
336
No
349
337
350
- * ** Will enabling / using this feature result in non-negligible increase of
351
- resource usage (CPU, RAM, disk, IO, ...) in any components?**
338
+ ###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
352
339
No, perhaps minimal increase in calculating the buckets for pod age
353
340
354
- ### Troubleshooting
355
-
356
- The Troubleshooting section currently serves the ` Playbook ` role. We may consider
357
- splitting it into a dedicated ` Playbook ` document (potentially with some monitoring
358
- details). For now, we leave it here.
341
+ ###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
342
+ No
359
343
360
- _ This section must be completed when targeting beta graduation to a release. _
344
+ ### Troubleshooting
361
345
362
- * ** How does this feature react if the API server and/or etcd is unavailable?**
346
+ ###### How does this feature react if the API server and/or etcd is unavailable?
363
347
N/a - this is not a feature of running workloads. The main controller will not work and
364
348
be unable to scale up or down if API or etcd are unavailable.
365
349
366
- * ** What are other known failure modes?**
350
+ ###### What are other known failure modes?
367
351
n/a
368
352
369
- * ** What steps should be taken if SLOs are not being met to determine the problem?**
353
+ ###### What steps should be taken if SLOs are not being met to determine the problem?
370
354
n/a
371
355
372
356
[ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
376
360
377
361
- 2021-01-06: Initial KEP submitted
378
362
- 2021-05-07: Updated KEP for graduation to beta
363
+ - 2024-05-21:Updated KEP for graduation to GA
379
364
380
365
## Drawbacks
381
366
0 commit comments