@@ -102,9 +102,9 @@ checklist items _must_ be updated for the enhancement to be released.
102
102
103
103
Items marked with (R) are required * prior to targeting to a milestone / release* .
104
104
105
- - [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
106
- - [ ] (R) KEP approvers have approved the KEP status as ` implementable `
107
- - [ ] (R) Design details are appropriately documented
105
+ - [x ] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
106
+ - [x ] (R) KEP approvers have approved the KEP status as ` implementable `
107
+ - [x ] (R) Design details are appropriately documented
108
108
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
109
109
- [ ] e2e Tests for all Beta API Operations (endpoints)
110
110
- [ ] (R) Ensure GA e2e tests meet requirements for [ Conformance Tests] ( https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md )
@@ -283,7 +283,7 @@ when drafting this test plan.
283
283
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
284
284
-->
285
285
286
- [ ] I/we understand the owners of the involved components may require updates to
286
+ [ x ] I/we understand the owners of the involved components may require updates to
287
287
existing tests to make this code solid enough prior to committing the changes necessary
288
288
to implement this enhancement.
289
289
@@ -335,7 +335,7 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
335
335
https://storage.googleapis.com/k8s-triage/index.html
336
336
-->
337
337
338
- - < test >: < link to test coverage >
338
+ N/A, the feature is tested using unit tests and e2e tests.
339
339
340
340
##### e2e tests
341
341
@@ -491,7 +491,8 @@ well as the [existing list] of feature gates.
491
491
492
492
- [x] Feature gate (also fill in values in ` kep.yaml ` )
493
493
- Feature gate name: HPAConfigurableTolerance
494
- - Components depending on the feature gate: ` kube-controller-manager `
494
+ - Components depending on the feature gate: ` kube-controller-manager ` and
495
+ ` kube-apiserver ` .
495
496
496
497
###### Does enabling the feature change any default behavior?
497
498
@@ -517,7 +518,8 @@ NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
517
518
518
519
The feature can be disabled by restarting the ` kube-controller-manager ` with the feature gate set to ` false ` .
519
520
520
- Any ` tolerance ` values set on existing HPAs will be ignored by the ` kube-controller-manager ` when the feature gate is off.
521
+ Any ` tolerance ` values set on existing HPAs will be ignored by the
522
+ ` kube-controller-manager ` and ` kube-apiserver ` when the feature gate is off.
521
523
522
524
###### What happens if we reenable the feature if it was previously rolled back?
523
525
@@ -538,6 +540,9 @@ You can take a look at one potential example of such test in:
538
540
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
539
541
-->
540
542
543
+ We will add a unit test verifying that HPAs with and without the new fields are
544
+ properly validated, both when the feature gate is enabled or not.
545
+
541
546
### Rollout, Upgrade and Rollback Planning
542
547
543
548
<!--
@@ -594,6 +599,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
594
599
logs or events for this purpose.
595
600
-->
596
601
602
+ The presence of the new ` tolerance ` HPA field indicates that the feature is
603
+ used.
604
+
597
605
###### How can someone using this feature know that it is working for their instance?
598
606
599
607
<!--
@@ -605,13 +613,18 @@ and operation of this feature.
605
613
Recall that end users cannot usually observe component logs or access metrics.
606
614
-->
607
615
608
- - [ ] Events
609
- - Event Reason:
610
- - [ ] API .status
611
- - Condition name:
612
- - Other field:
613
- - [ ] Other (treat as last resort)
614
- - Details:
616
+ - [X] Events
617
+ - Event Reason: ` SuccessfulRescale `
618
+
619
+ The tolerance is applied on the ratio between the _ current_ and _ desired_ metric
620
+ values. Users can get both values using
621
+ [ ` kubectl describe ` ] ( https://github.com/kubernetes/kubernetes/blob/1b7a0591871772fbbc0fda430b3b73bc24c0e738/staging/src/k8s.io/kubectl/pkg/describe/describe.go#L4109 )
622
+ and use them to verify that scaling events are triggered when their ratio is out
623
+ of tolerance.
624
+
625
+ We will update the controller-manager logs to help users understand the behavior
626
+ of the autoscaler. The data added to the logs will include the tolerance used
627
+ for each scaling decision.
615
628
616
629
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
617
630
@@ -630,18 +643,21 @@ These goals will help you determine what you need to measure (SLIs) in the next
630
643
question.
631
644
-->
632
645
646
+ N/A.
647
+
633
648
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
634
649
635
650
<!--
636
651
Pick one more of these and delete the rest.
637
652
-->
638
653
639
- - [ ] Metrics
640
- - Metric name:
641
- - [ Optional] Aggregation method:
642
- - Components exposing the metric:
643
- - [ ] Other (treat as last resort)
644
- - Details:
654
+ This KEP is not expected to have any impact on SLIs/SLOs as it doesn't introduce
655
+ a new HPA behavior, but merely allows users to easily change the value of a
656
+ parameter that's otherwise difficult to update.
657
+
658
+ Standard HPA metrics (e.g.
659
+ ` horizontal_pod_autoscaler_controller_metric_computation_duration_seconds ` ) can
660
+ be used to verify the HPA controller health.
645
661
646
662
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
647
663
@@ -650,6 +666,12 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
650
666
implementation difficulties, etc.).
651
667
-->
652
668
669
+ Users may want to see a signal that autoscaling isn't happening because of the
670
+ tolerance, but this is not directly related to this KEP (this problem already
671
+ exists today with the hard-coded 10% tolerance), and taking this KEP as an
672
+ opportunity to improve the situation is difficult (see
673
+ [ this thread] ( https://github.com/kubernetes/enhancements/pull/4954#discussion_r1857098884 ) ).
674
+
653
675
### Dependencies
654
676
655
677
<!--
@@ -775,6 +797,8 @@ Are there any tests that were run/should be run to understand performance charac
775
797
and validate the declared limits?
776
798
-->
777
799
800
+ No.
801
+
778
802
### Troubleshooting
779
803
780
804
<!--
@@ -820,6 +844,8 @@ Major milestones might include:
820
844
- when the KEP was retired or superseded
821
845
-->
822
846
847
+ 2025-01-21: KEP PR merged.
848
+
823
849
## Drawbacks
824
850
825
851
<!--
0 commit comments