You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Workload scales successfully because the metric ratio is out of tolerance.
362
-
- Autoscaling uses the default when no tolerances are set.
359
+
The new [e2e autoscaling tests] covering this feature are:
360
+
361
+
-[Test with large configurable tolerance](https://github.com/kubernetes/kubernetes/blob/07142400ecd02126602ffaa6f91712cd3f1e170c/test/e2e/autoscaling/horizontal_pod_autoscaling_behavior.go#L509): [SIG autoscaling](https://testgrid.k8s.io/sig-autoscaling-hpa#gci-gce-autoscaling-hpa-cpu-alpha-beta-pull&include-filter-by-regex=HPAConfigurableTolerance.*large%20configurable%20tolerance), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=HPAConfigurableTolerance.*large%20configurable%20tolerance)
362
+
363
+
Before the graduation to beta, we will add an integration test verifying the autoscaling
364
+
behavior when smaller and larger than default tolerances are set on an HPA.
[Unit tests have been added](https://github.com/kubernetes/kubernetes/pull/130797/commits/a41284d9fa3a3d5a5e8760db6e9fd4f7e5e6fca6#diff-98f8520444a477d01c5cc2e56f92939d5fb07893a234b8fee5b67c7c147a20e0) to verify that HPAs with and without the new fields are
556
+
[Unit tests have been added](https://github.com/kubernetes/kubernetes/blob/07142400ecd02126602ffaa6f91712cd3f1e170c/pkg/apis/autoscaling/validation/validation_test.go#L1648) to verify that HPAs with and without the new fields are
555
557
properly validated, both when the feature gate is enabled or not.
556
558
557
559
### Rollout, Upgrade and Rollback Planning
@@ -594,9 +596,96 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
594
596
are missing a bunch of machinery and tooling and can't do that now.
595
597
-->
596
598
597
-
I have manually tested a cluster upgrade, and this feature is in alpha without
598
-
(to the best of our knowledge) any user reporting an issue. GKE has automated
599
-
upgrade/downgrade tests that did not report any issue.
599
+
The upgrade→downgrade→upgrade testing was done manually using a 1.33 cluster with the following steps:
4. Simulate downgrade by re-enabling the feature for api server and control-plane. Follow the procedure described
687
+
in step 1, and observe that the HPA description mentions `ScalingLimited: False`, demonstrates that the feature
688
+
is working again.
600
689
601
690
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
602
691
@@ -646,7 +735,7 @@ values. Users can get both values using
646
735
and use them to verify that scaling events are triggered when their ratio is out
647
736
of tolerance.
648
737
649
-
The [controller-manager logs have been updated](https://github.com/kubernetes/kubernetes/pull/130797/commits/2dd9eda47ffd5556ff90446e91d22ddbecc05d2c#diff-f1c5a31aa8fb8e3fd64b6aa13d3358b504e6e25030f249f1652e244c105eafc7R846)
738
+
The [controller-manager logs have been updated](https://github.com/kubernetes/kubernetes/blob/07142400ecd02126602ffaa6f91712cd3f1e170c/pkg/controller/podautoscaler/horizontal.go#L846)
650
739
to help users understand the behavior of the autoscaler. The data added to the
651
740
logs includes the tolerance used for each scaling decision.
652
741
@@ -667,7 +756,9 @@ These goals will help you determine what you need to measure (SLIs) in the next
667
756
question.
668
757
-->
669
758
670
-
N/A.
759
+
Although the absolute value of the `horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`
760
+
metric depends on HPAs configuration, it should be unimpacted by this feature. This metric should not vary
761
+
by more than 5%.
671
762
672
763
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
673
764
@@ -679,8 +770,7 @@ This KEP is not expected to have any impact on SLIs/SLOs as it doesn't introduce
679
770
a new HPA behavior, but merely allows users to easily change the value of a
680
771
parameter that's otherwise difficult to update.
681
772
682
-
Standard HPA metrics (e.g.
683
-
`horizontal_pod_autoscaler_controller_metric_computation_duration_seconds`) can
773
+
The standard HPA metric `horizontal_pod_autoscaler_controller_metric_computation_duration_seconds` can
684
774
be used to verify the HPA controller health.
685
775
686
776
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
@@ -857,13 +947,19 @@ For each of them, fill in the following information by copying the below templat
857
947
- Testing: Are there any tests for failure mode? If not, describe why.
858
948
-->
859
949
860
-
We do not expect any new failure mode. (While setting inappropriate `tolerance`
861
-
values may cause HPAs to react too slowly or too fast, the feature is working as
862
-
intended.)
950
+
We do not expect any new failure mode. (While setting `tolerance` below 10% can cause HPAs
951
+
to scale up and down as frequently as every 30s, and higher values might stop scaling altogether
952
+
if the metric remains within the tolerance band, the feature is still working as intended.
953
+
To make HPAs respond faster, decrease the tolerance value. Conversely, to make them respond
954
+
slower, increase the tolerance value.)
863
955
864
956
###### What steps should be taken if SLOs are not being met to determine the problem?
865
957
866
-
N/A.
958
+
If possible increase the log level for kube-controller-manager and check controller logs:
959
+
1. Search for "Proposing desired replicas", verify that the tolerance is set as expected,
960
+
and check (using `kubectl describe hpa`) if the ratio between the _current_ and _desired_
961
+
metric values is in tolerance.
962
+
3. Look for warnings and errors which might point where the problem lies.
867
963
868
964
## Implementation History
869
965
@@ -881,6 +977,7 @@ Major milestones might include:
0 commit comments