Skip to content

Commit 3bc0db9

Browse files
committed
fixed based on suggestion
1 parent a323c55 commit 3bc0db9

File tree

1 file changed

+21
-4
lines changed
  • keps/sig-autoscaling/1610-container-resource-autoscaling

1 file changed

+21
-4
lines changed

keps/sig-autoscaling/1610-container-resource-autoscaling/README.md

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -577,6 +577,7 @@ in back-to-back releases.
577577
- The feature gate is enabled by default.
578578
- No negative feedback during alpha for a long-enough time.
579579
- No bug issues reported during alpha.
580+
- Implementing/exposing metrics in HPA so that users can monitor the HPA controller for this feature.
580581

581582
#### GA
582583

@@ -766,7 +767,12 @@ What signals should users be paying attention to when the feature is young
766767
that might indicate a serious problem?
767768
-->
768769

769-
- Many HPAs are in `ScalingActive: false` condition with `FailedGetContainerResourceMetric` reason.
770+
- The container resource metric takes much longer time compared to other metrics.
771+
which can be monitored via the 1st metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
772+
- Increase the overall performance of HPA controller
773+
which can be monitored via the 2nd metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
774+
- Many error occurrence on the container resource metrics
775+
which can be monitored via the 3rd metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
770776

771777
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
772778

@@ -854,7 +860,14 @@ Pick one more of these and delete the rest.
854860
- Details:
855861
-->
856862

857-
N/A
863+
HPA controller have no metrics in it now.
864+
The following metrics will be implemented by beta. ([issue](https://github.com/kubernetes/kubernetes/issues/115639))
865+
1. How long does each metric type take to compute the ideal replica num.
866+
- so that users can confirm the container resource metric doesn't take long time compared to other metrics.
867+
2. How long does the HPA controller take to complete reconcile one HPA object.
868+
- so that users can confirm the container resource metric doesn't increse the whole time of scaling.
869+
3. Provide the metric to show error occurrence for each metric.
870+
- so that users can confirm no much error occurrence on the container resource metric.
858871

859872
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
860873

@@ -863,7 +876,7 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
863876
implementation difficulties, etc.).
864877
-->
865878

866-
N/A
879+
Yes. We're planning to implement the metrics described in [What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?](#what-are-the-slis-service-level-indicators-an-operator-can-use-to-determine-the-health-of-the-service) section.
867880

868881
### Dependencies
869882

@@ -888,7 +901,11 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
888901
- Impact of its degraded performance or high-error rates on the feature:
889902
-->
890903

891-
No.
904+
Yes.
905+
The HPA requires the `metrics.k8s.io` APIs to be available in the cluster to operate. This API is served by the Metrics Server,
906+
without Metrics Server autoscaling on container resource metrics will not work.
907+
If there are multiple metrics defined and one is not available, scale up will
908+
continue but scale down will not (for safety).
892909

893910
### Scalability
894911

0 commit comments

Comments
 (0)