You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -699,13 +702,20 @@ NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
699
702
700
703
The feature can be disabled in Alpha and Beta versions
701
704
by restarting kube-apiserver and kube-controller-manager with the feature-gate off.
705
+
706
+
As described in [Upgrade / Downgrade Strategy](#upgrade-/-downgrade-strategy),
707
+
during the feature-gate off, all existing `ContainerResource` will be ignored by the HPA controller.
708
+
702
709
In terms of Stable versions, users can choose to opt-out by not setting the
703
710
`ContainerResource`type metric in their HPA.
704
711
705
712
###### What happens if we reenable the feature if it was previously rolled back?
706
713
707
714
HPA with `ContainerResource` type metric can be created and can be handled by HPA controller.
708
715
716
+
If there have been HPAs with the `ContainerResource` type metric created before the roll back,
717
+
those `ContainerResource` will be restarted to get handled by the HPA controller.
718
+
709
719
###### Are there any tests for feature enablement/disablement?
710
720
711
721
<!--
@@ -755,7 +765,7 @@ What signals should users be paying attention to when the feature is young
755
765
that might indicate a serious problem?
756
766
-->
757
767
758
-
N/A
768
+
- so many HPAs are in `ScalingActive: false` condition with `FailedGetContainerResourceMetric` reason.
759
769
760
770
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
761
771
@@ -807,8 +817,8 @@ Recall that end users cannot usually observe component logs or access metrics.
807
817
-->
808
818
809
819
- [x] Events
810
-
- `SuccessfulRescale`event with `memory resource utilization (percentage of request) above target`
811
-
- Note that we cannot know if this reason is due to the `Resource` metric or `ContainerResource` in the current implementation. It'll be fixed to be able to distinguish.
820
+
- `SuccessfulRescale`event with `memory/cpu/etc resource utilization (percentage of request) above/below target`
821
+
- Note that we cannot know if this reason is due to the `Resource` metric or `ContainerResource` in the current implementation. We'll change this reason for `ContainerResource` to `memory/cpu/etc container resource utilization (percentage of request) above/below target` so that we can distinguish.
812
822
- [x] API .status
813
823
- When something wrong with the container metrics, `ScalingActive` condition will be false with `FailedGetContainerResourceMetric` reason.
814
824
@@ -1000,7 +1010,11 @@ For each of them, fill in the following information by copying the below templat
1000
1010
- Testing: Are there any tests for failure mode? If not, describe why.
1001
1011
-->
1002
1012
1003
-
N/A
1013
+
- Failed to get container resource metric.
1014
+
- Detection: `ScalingActive: false`condition with `FailedGetContainerResourceMetric` reason.
1015
+
- Mitigations: remove failed `ContainerResource` in HPAs.
1016
+
- Diagnostics: Related errors should be printed as the messages of `ScalingActive: false`.
###### What steps should be taken if SLOs are not being met to determine the problem?
1006
1020
@@ -1023,6 +1037,11 @@ not need to be as detailed as the proposal, but should include enough
1023
1037
information to express the idea and why it was not acceptable.
1024
1038
-->
1025
1039
1040
+
There's an alternative way to scale on container-level metrics without introducing ContainerResource metrics.
1041
+
1042
+
Users can export resource consumption metrics from containers on their own to an external metrics source and then configure HPA based on this external metric.
1043
+
However this is cumbersome and results in delayed scaling decisions as using the external metrics path typically adds latency compared to in-cluster resource metrics path.
0 commit comments