Skip to content

Commit 4e9f691

Browse files
committed
fix based on reviews
1 parent 1710e52 commit 4e9f691

File tree

1 file changed

+24
-5
lines changed
  • keps/sig-autoscaling/1610-container-resource-autoscaling

1 file changed

+24
-5
lines changed

keps/sig-autoscaling/1610-container-resource-autoscaling/README.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -492,13 +492,16 @@ https://storage.googleapis.com/k8s-triage/index.html
492492
We expect no non-infra related flakes in the last month as a GA graduation criteria.
493493
-->
494494

495-
- `/test/e2e/autoscaling/horizontal_pod_autoscaling.go`: https://storage.googleapis.com/k8s-triage/index.html?sig=autoscaling&job=ci-kubernetes-e2e-gci-gce-autoscaling&test=%5C(Container%20Resource%5C)
495+
**k8s-triage**
496+
497+
- https://storage.googleapis.com/k8s-triage/index.html?sig=autoscaling&job=ci-kubernetes-e2e-gci-gce-autoscaling&test=Container%20Resource
496498

497499
**tests**
498500

499501
- https://github.com/kubernetes/kubernetes/blob/d4750857760ae55802f69989dc2451feeb9a29e5/test/e2e/autoscaling/horizontal_pod_autoscaling.go#L61
500502
- https://github.com/kubernetes/kubernetes/blob/d4750857760ae55802f69989dc2451feeb9a29e5/test/e2e/autoscaling/horizontal_pod_autoscaling.go#L163
501503
- https://github.com/kubernetes/kubernetes/blob/d4750857760ae55802f69989dc2451feeb9a29e5/test/e2e/autoscaling/horizontal_pod_autoscaling.go#L120
504+
- https://github.com/kubernetes/kubernetes/blob/d4750857760ae55802f69989dc2451feeb9a29e5/test/e2e/autoscaling/custom_metrics_stackdriver_autoscaling.go#L323
502505

503506
### Graduation Criteria
504507

@@ -699,13 +702,20 @@ NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
699702

700703
The feature can be disabled in Alpha and Beta versions
701704
by restarting kube-apiserver and kube-controller-manager with the feature-gate off.
705+
706+
As described in [Upgrade / Downgrade Strategy](#upgrade-/-downgrade-strategy),
707+
during the feature-gate off, all existing `ContainerResource` will be ignored by the HPA controller.
708+
702709
In terms of Stable versions, users can choose to opt-out by not setting the
703710
`ContainerResource` type metric in their HPA.
704711

705712
###### What happens if we reenable the feature if it was previously rolled back?
706713

707714
HPA with `ContainerResource` type metric can be created and can be handled by HPA controller.
708715

716+
If there have been HPAs with the `ContainerResource` type metric created before the roll back,
717+
those `ContainerResource` will be restarted to get handled by the HPA controller.
718+
709719
###### Are there any tests for feature enablement/disablement?
710720

711721
<!--
@@ -755,7 +765,7 @@ What signals should users be paying attention to when the feature is young
755765
that might indicate a serious problem?
756766
-->
757767

758-
N/A
768+
- so many HPAs are in `ScalingActive: false` condition with `FailedGetContainerResourceMetric` reason.
759769

760770
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
761771

@@ -807,8 +817,8 @@ Recall that end users cannot usually observe component logs or access metrics.
807817
-->
808818

809819
- [x] Events
810-
- `SuccessfulRescale` event with `memory resource utilization (percentage of request) above target`
811-
- Note that we cannot know if this reason is due to the `Resource` metric or `ContainerResource` in the current implementation. It'll be fixed to be able to distinguish.
820+
- `SuccessfulRescale` event with `memory/cpu/etc resource utilization (percentage of request) above/below target`
821+
- Note that we cannot know if this reason is due to the `Resource` metric or `ContainerResource` in the current implementation. We'll change this reason for `ContainerResource` to `memory/cpu/etc container resource utilization (percentage of request) above/below target` so that we can distinguish.
812822
- [x] API .status
813823
- When something wrong with the container metrics, `ScalingActive` condition will be false with `FailedGetContainerResourceMetric` reason.
814824

@@ -1000,7 +1010,11 @@ For each of them, fill in the following information by copying the below templat
10001010
- Testing: Are there any tests for failure mode? If not, describe why.
10011011
-->
10021012

1003-
N/A
1013+
- Failed to get container resource metric.
1014+
- Detection: `ScalingActive: false` condition with `FailedGetContainerResourceMetric` reason.
1015+
- Mitigations: remove failed `ContainerResource` in HPAs.
1016+
- Diagnostics: Related errors should be printed as the messages of `ScalingActive: false`.
1017+
- Testing: https://github.com/kubernetes/kubernetes/blob/0e3818e02760afa8ed0bea74c6973f605ca4683c/pkg/controller/podautoscaler/replica_calculator_test.go#L451
10041018

10051019
###### What steps should be taken if SLOs are not being met to determine the problem?
10061020

@@ -1023,6 +1037,11 @@ not need to be as detailed as the proposal, but should include enough
10231037
information to express the idea and why it was not acceptable.
10241038
-->
10251039

1040+
There's an alternative way to scale on container-level metrics without introducing ContainerResource metrics.
1041+
1042+
Users can export resource consumption metrics from containers on their own to an external metrics source and then configure HPA based on this external metric.
1043+
However this is cumbersome and results in delayed scaling decisions as using the external metrics path typically adds latency compared to in-cluster resource metrics path.
1044+
10261045
## Infrastructure Needed (Optional)
10271046

10281047
<!--

0 commit comments

Comments
 (0)