You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-autoscaling/1610-container-resource-autoscaling/README.md
+21-4Lines changed: 21 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -577,6 +577,7 @@ in back-to-back releases.
577
577
- The feature gate is enabled by default.
578
578
- No negative feedback during alpha for a long-enough time.
579
579
- No bug issues reported during alpha.
580
+
- Implementing/exposing metrics in HPA so that users can monitor the HPA controller for this feature.
580
581
581
582
#### GA
582
583
@@ -766,7 +767,12 @@ What signals should users be paying attention to when the feature is young
766
767
that might indicate a serious problem?
767
768
-->
768
769
769
-
- Many HPAs are in `ScalingActive: false` condition with `FailedGetContainerResourceMetric` reason.
770
+
- The container resource metric takes much longer time compared to other metrics.
771
+
which can be monitored via the 1st metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
772
+
- Increase the overall performance of HPA controller
773
+
which can be monitored via the 2nd metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
774
+
- Many error occurrence on the container resource metrics
775
+
which can be monitored via the 3rd metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
770
776
771
777
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
772
778
@@ -854,7 +860,14 @@ Pick one more of these and delete the rest.
854
860
- Details:
855
861
-->
856
862
857
-
N/A
863
+
HPA controller have no metrics in it now.
864
+
The following metrics will be implemented by beta. ([issue](https://github.com/kubernetes/kubernetes/issues/115639))
865
+
1. How long does each metric type take to compute the ideal replica num.
866
+
- so that users can confirm the container resource metric doesn't take long time compared to other metrics.
867
+
2. How long does the HPA controller take to complete reconcile one HPA object.
868
+
- so that users can confirm the container resource metric doesn't increse the whole time of scaling.
869
+
3. Provide the metric to show error occurrence for each metric.
870
+
- so that users can confirm no much error occurrence on the container resource metric.
858
871
859
872
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
860
873
@@ -863,7 +876,7 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
863
876
implementation difficulties, etc.).
864
877
-->
865
878
866
-
N/A
879
+
Yes. We're planning to implement the metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
867
880
868
881
### Dependencies
869
882
@@ -888,7 +901,11 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
888
901
- Impact of its degraded performance or high-error rates on the feature:
889
902
-->
890
903
891
-
No.
904
+
Yes.
905
+
The HPA requires the `metrics.k8s.io` APIs to be available in the cluster to operate. This API is served by the Metrics Server,
906
+
without Metrics Server autoscaling on container resource metrics will not work.
907
+
If there are multiple metrics defined and one is not available, scale up will
0 commit comments