Skip to content

Commit 1c6f86b

Browse files
committed
Added KEP for HPA scaling based on container resources
Signed-off-by: Arjun Naik <[email protected]>
1 parent 0d2e19b commit 1c6f86b

File tree

1 file changed

+263
-0
lines changed

1 file changed

+263
-0
lines changed
Lines changed: 263 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,263 @@
1+
2+
---
3+
title: Container Resource based Autoscaling
4+
authors:
5+
- "@arjunrn"
6+
owning-sig: sig-autoscaling
7+
reviewers:
8+
- "@josephburnett"
9+
- "@mwielgus"
10+
approvers:
11+
- "@josephburnett"
12+
creation-date: 2020-02-18
13+
last-updated: 2020-02-18
14+
status: provisional
15+
---
16+
17+
# Kubernetes Enhancement Proposal Process
18+
19+
## Table of Contents
20+
21+
<!-- toc -->
22+
- [Summary](#summary)
23+
- [Motivation](#motivation)
24+
- [Goals](#goals)
25+
- [Non-Goals](#non-goals)
26+
- [Proposal](#proposal)
27+
- [User Stories](#user-stories)
28+
- [Multiple containers with different scaling thresholds](#multiple-containers-with-different-scaling-thresholds)
29+
- [Multiple containers but only scaling for one.](#multiple-containers-but-only-scaling-for-one)
30+
- [Risks and Mitigations](#risks-and-mitigations)
31+
- [Design Details](#design-details)
32+
- [Test Plan](#test-plan)
33+
- [Graduation Criteria](#graduation-criteria)
34+
- [Upgrade/Downgrade Strategy](#upgradedowngrade-strategy)
35+
- [Implementation History](#implementation-history)
36+
<!-- /toc -->
37+
38+
## Summary
39+
40+
The Horizontal Pod Autoscaler supports scaling of targets based on the resource usage
41+
of the pods in the target. The resource usage of pods is calculated as the sum
42+
of the individual container usage values of the pod. This is unsuitable for workloads where
43+
the usage of the containers are not strongly correlated or change in lockstep. This KEP
44+
suggests that when scaling based on resource usage the HPA also provide an option
45+
to consider the usages of individual containers to make scaling decisions.
46+
47+
## Motivation
48+
49+
An HPA is used to ensure that a scaling target is scaled up or down in such a way that the
50+
specificed current metric values are always maintained at a certain level. Resource based
51+
autoscaling is the most basic approach to autoscaling and has been present in the HPA spec since `v1`.
52+
In this mode the HPA controller fetches the current resource metrics for all the pods of a scaling
53+
target and then computes how many pods should be added or removed based on the current usage to
54+
achieve the target average usage.
55+
56+
For performance critical applications where the resource usage of individual containers needs to
57+
be configured individually the default behavior of the HPA controller may be unsuitable. When
58+
there are multiple containers in the pod their individual resource usages may not have a direct
59+
correlation or may grow at different rates as the load changes. There are several reasons for this:
60+
61+
- A sidecar container is only providing an auxiliary service such as log shipping. If the
62+
application does not log very frequently or does not produce logs in its hotpath then the usage of
63+
the log shipper will not grow.
64+
- A sidecar container which provides authentication. Due to heavy caching the usage will only
65+
increase slightly when the load on the main container increases. In the current blended usage
66+
calculation approach this usually results in the the HPA not scaling up the deployment because
67+
the blended usage is still low.
68+
- A sidecar may be injected without resources set which prevents scaling based on utilization. In
69+
the current logic the HPA controller can only scale on absolute resource usage of the pod when
70+
the resource requests are not set.
71+
72+
The optimum usage of the containers may also be at different levels. Hence the HPA should offer
73+
a way to specify the target usage in a more fine grained manner.
74+
75+
### Goals
76+
77+
- Make HPA scale based on individual container resources usage
78+
- Alias the resource metric source to pod resource metric source.
79+
80+
### Non-Goals
81+
- Configurable aggregation for containers resources in pods.
82+
- Optimization of the calls to the `metrics-server`
83+
84+
## Proposal
85+
86+
Currently the HPA accepts multiple metric sources to calculate the number of replicas in the target,
87+
one of which is called `Resource`. The `Resource` type represents the resource usage of the
88+
pods in the scaling target. The resource metric source has the following structure:
89+
90+
```go
91+
type ResourceMetricSource struct {
92+
Name v1.ResourceName
93+
Target MetricTarget
94+
}
95+
```
96+
97+
Here the `Name` is the name of the resource. Currently on `cpu` and `memory` are supported
98+
for this field. The other field is used to specify the target at which the HPA should maintain
99+
the resource usage by adding or removing pods. For instance if the target is _60%_ CPU utilization,
100+
and the current average of the CPU resources across all the pods of the target is _70%_ then
101+
the HPA will add pods to reduce the CPU utilization. If it's less than _60%_ then the HPA will
102+
remove pods to increase utilization.
103+
104+
It should be noted here that when a pod has multiple containers the HPA gets the resource
105+
usage of all the containers and sums them to get the total usage. This is then divided
106+
by the total requested resources to get the average utilizations. For instance if there is
107+
a pods with 2 containers: `application` and `log-shipper` requesting `250m` and `250m` of
108+
CPU resources then the total requested resources of the pod as calculated by the HPA is `500m`.
109+
If then the first container is currently using `200m` and the second only `50m` then
110+
the usage of the pod is `250m` which in utilization is _50%_. Although individually
111+
the utilization of the containers are _80%_ and _20%_. In such a situation the performance
112+
of the `application` container might be affected significantly. There is no way to specify
113+
in the HPA to keep the utilization of the first container below a certain threshold. This also
114+
affects `memory` resource based autocaling scaling.
115+
116+
We propose that the following changes be made to the metric sources to address this problem:
117+
118+
1. A new metric source called `ContainerResourceMetricSource` be introduced with the following
119+
structure:
120+
121+
```go
122+
type ContainerResourceMetricSource struct {
123+
Container string
124+
Name v1.ResourceName
125+
Target MetricTarget
126+
}
127+
```
128+
129+
The only new field is `Container` which is the name of the container for which the resource
130+
usage should be tracked.
131+
132+
2. The `ResourceMetricSource` should be aliased to `PodResourceMetricSource`. It will work
133+
exactly as the original. The aliasing is done for the sake of consistency. Correspondingly,
134+
the `type` field for the metric source should be extended support both `ContainerResource`
135+
and `PodResource` as values.
136+
137+
### User Stories
138+
139+
#### Multiple containers with different scaling thresholds
140+
141+
Assume the user has a deployment with multiple pods, each of which have multiple containers. A main
142+
container called `application` and 2 others called `log-shipping` and `authnz-proxy`. Two
143+
of the containers are critical to provide the application functionality, `application` and
144+
`authnz-proxy`. The user would like to prevent _OOMKill_ of these containers and also keep
145+
their CPU utilization low to ensure the highest performance. The other container
146+
`log-shipping` is less critical and can tolerate failures and restarts. In this case the
147+
user would create an HPA with the following configuration:
148+
149+
```yaml
150+
apiVersion: autoscaling/v2beta2
151+
kind: HorizontalPodAutoscaler
152+
metadata:
153+
name: mission-critical
154+
spec:
155+
scaleTargetRef:
156+
apiVersion: apps/v1
157+
kind: Deployment
158+
name: mission-critical
159+
minReplicas: 1
160+
maxReplicas: 10
161+
metrics:
162+
- type: ContainerResource
163+
resource:
164+
name: cpu
165+
container: application
166+
target:
167+
type: Utilization
168+
averageUtilization: 30
169+
- type: ContainerResource
170+
resource:
171+
name: memory
172+
container: application
173+
target:
174+
type: Utilization
175+
averageUtilization: 80
176+
- type: ContainerResource
177+
resource:
178+
name: cpu
179+
container: authnz-proxy
180+
target:
181+
type: Utilization
182+
averageUtilization: 30
183+
- type: ContainerResource
184+
resource:
185+
name: memory
186+
container: authnz-proxy
187+
target:
188+
type: Utilization
189+
averageUtilization: 80
190+
- type: ContainerResource
191+
resource:
192+
name: cpu
193+
container: log-shipping
194+
target:
195+
type: Utilization
196+
averageUtilization: 80
197+
```
198+
199+
The HPA specifies that the HPA controller should maintain the CPU utilization of the containers
200+
`application` and `authnz-proxy` at _30%_ and the memory utilization at _80%_. The `log-shipping`
201+
container is scaled to keep the cpu utilization at _80%_ and is not scaled on memory.
202+
203+
#### Multiple containers but only scaling for one.
204+
Assume the user has a deployment where the pod spec has multiple containers but scaling should
205+
be performed based only on the utilization of one of the containers. There could be several reasons
206+
for such a strategy: Disruptions due to scaling of sidecars may be expensive and should be avoided
207+
or the resource usage of the sidecars could be erratic because it has a different work characteristics
208+
to the main container.
209+
210+
In such a case the user creates an HPA as follows:
211+
212+
```yaml
213+
apiVersion: autoscaling/v2beta2
214+
kind: HorizontalPodAutoscaler
215+
metadata:
216+
name: mission-critical
217+
spec:
218+
scaleTargetRef:
219+
apiVersion: apps/v1
220+
kind: Deployment
221+
name: mission-critical
222+
minReplicas: 1
223+
maxReplicas: 10
224+
metrics:
225+
- type: ContainerResource
226+
resource:
227+
name: cpu
228+
container: application
229+
target:
230+
type: Utilization
231+
averageUtilization: 30
232+
```
233+
234+
The HPA controller will then completely ignores the resource usage in other containers.
235+
236+
### Risks and Mitigations
237+
238+
In order to keep backward compatibility with the existing API both `ResourceMetricSource` and
239+
`PodResourceMetricSource` will be supported. Existing HPAs will continue functioning like before.
240+
There will be no deprecation warning or internal migrations from `ResourceMetricSource` to
241+
`PodResourceMetricSource`.
242+
243+
244+
## Design Details
245+
246+
### Test Plan
247+
TBD
248+
249+
### Graduation Criteria
250+
251+
Since the feature is being added to the HPA version `v2beta2` there is no further graduation
252+
criteria required because it will graduate when the original API graduates to `stable`
253+
254+
### Upgrade/Downgrade Strategy
255+
256+
For cluster upgrades the HPAs from the previous version will continue working as before. There
257+
is no change in behavior or flags which have to be enabled or disabled.
258+
259+
For clusters which have HPAs which use `ContainerResourceMetricSource` or `PodResourceMetricSource`
260+
a downgrade is possible after HPAs which use this new source have been modified to use
261+
`ResourceMetricSource` instead.
262+
263+
## Implementation History

0 commit comments

Comments
 (0)