|
| 1 | +--- |
| 2 | +title: Container Resource based Autoscaling |
| 3 | +authors: |
| 4 | + - "@arjunrn" |
| 5 | +owning-sig: sig-autoscaling |
| 6 | +reviewers: |
| 7 | + - "@josephburnett" |
| 8 | + - "@mwielgus" |
| 9 | +approvers: |
| 10 | + - "@josephburnett" |
| 11 | +creation-date: 2020-02-18 |
| 12 | +last-updated: 2020-02-18 |
| 13 | +status: provisional |
| 14 | +--- |
| 15 | + |
| 16 | +# Kubernetes Enhancement Proposal Process |
| 17 | + |
| 18 | +## Table of Contents |
| 19 | + |
| 20 | +<!-- toc --> |
| 21 | +- [Summary](#summary) |
| 22 | +- [Motivation](#motivation) |
| 23 | + - [Goals](#goals) |
| 24 | + - [Non-Goals](#non-goals) |
| 25 | +- [Proposal](#proposal) |
| 26 | + - [User Stories](#user-stories) |
| 27 | + - [Multiple containers with different scaling thresholds](#multiple-containers-with-different-scaling-thresholds) |
| 28 | + - [Multiple containers but only scaling for one.](#multiple-containers-but-only-scaling-for-one) |
| 29 | + - [Add container metrics to existing pod resource metric.](#add-container-metrics-to-existing-pod-resource-metric) |
| 30 | + - [Risks and Mitigations](#risks-and-mitigations) |
| 31 | +- [Design Details](#design-details) |
| 32 | + - [Test Plan](#test-plan) |
| 33 | + - [Graduation Criteria](#graduation-criteria) |
| 34 | + - [Upgrade/Downgrade Strategy](#upgradedowngrade-strategy) |
| 35 | +- [Implementation History](#implementation-history) |
| 36 | +<!-- /toc --> |
| 37 | + |
| 38 | +## Summary |
| 39 | + |
| 40 | +The Horizontal Pod Autoscaler supports scaling of targets based on the resource usage |
| 41 | +of the pods in the target. The resource usage of pods is calculated as the sum |
| 42 | +of the individual container usage values of the pod. This is unsuitable for workloads where |
| 43 | +the usage of the containers are not strongly correlated or do not change in lockstep. This KEP |
| 44 | +suggests that when scaling based on resource usage the HPA also provide an option |
| 45 | +to consider the usages of individual containers to make scaling decisions. |
| 46 | + |
| 47 | +## Motivation |
| 48 | + |
| 49 | +An HPA is used to ensure that a scaling target is scaled up or down in such a way that the |
| 50 | +specificed current metric values are always maintained at a certain level. Resource based |
| 51 | +autoscaling is the most basic approach to autoscaling and has been present in the HPA spec since `v1`. |
| 52 | +In this mode the HPA controller fetches the current resource metrics for all the pods of a scaling |
| 53 | +target and then computes how many pods should be added or removed based on the current usage to |
| 54 | +achieve the target average usage. |
| 55 | + |
| 56 | +For performance critical applications where the resource usage of individual containers needs to |
| 57 | +be configured individually the default behavior of the HPA controller may be unsuitable. When |
| 58 | +there are multiple containers in the pod their individual resource usages may not have a direct |
| 59 | +correlation or may grow at different rates as the load changes. There are several reasons for this: |
| 60 | + |
| 61 | + - A sidecar container is only providing an auxiliary service such as log shipping. If the |
| 62 | + application does not log very frequently or does not produce logs in its hotpath then the usage of |
| 63 | + the log shipper will not grow. |
| 64 | + - A sidecar container which provides authentication. Due to heavy caching the usage will only |
| 65 | + increase slightly when the load on the main container increases. In the current blended usage |
| 66 | + calculation approach this usually results in the the HPA not scaling up the deployment because |
| 67 | + the blended usage is still low. |
| 68 | + - A sidecar may be injected without resources set which prevents scaling based on utilization. In |
| 69 | + the current logic the HPA controller can only scale on absolute resource usage of the pod when |
| 70 | + the resource requests are not set. |
| 71 | + |
| 72 | +The optimum usage of the containers may also be at different levels. Hence the HPA should offer |
| 73 | +a way to specify the target usage in a more fine grained manner. |
| 74 | + |
| 75 | +### Goals |
| 76 | + |
| 77 | +- Make HPA scale based on individual container resources usage |
| 78 | +- Alias the resource metric source to pod resource metric source. |
| 79 | + |
| 80 | +### Non-Goals |
| 81 | +- Configurable aggregation for containers resources in pods. |
| 82 | +- Optimization of the calls to the `metrics-server` |
| 83 | + |
| 84 | +## Proposal |
| 85 | + |
| 86 | +Currently the HPA accepts multiple metric sources to calculate the number of replicas in the target, |
| 87 | +one of which is called `Resource`. The `Resource` type represents the resource usage of the |
| 88 | +pods in the scaling target. The resource metric source has the following structure: |
| 89 | + |
| 90 | +```go |
| 91 | +type ResourceMetricSource struct { |
| 92 | + Name v1.ResourceName |
| 93 | + Target MetricTarget |
| 94 | +} |
| 95 | +``` |
| 96 | + |
| 97 | +Here the `Name` is the name of the resource. Currently only `cpu` and `memory` are supported |
| 98 | +for this field. The other field is used to specify the target at which the HPA should maintain |
| 99 | +the resource usage by adding or removing pods. For instance if the target is _60%_ CPU utilization, |
| 100 | +and the current average of the CPU resources across all the pods of the target is _70%_ then |
| 101 | +the HPA will add pods to reduce the CPU utilization. If it's less than _60%_ then the HPA will |
| 102 | +remove pods to increase utilization. |
| 103 | + |
| 104 | +It should be noted here that when a pod has multiple containers the HPA gets the resource |
| 105 | +usage of all the containers and sums them to get the total usage. This is then divided |
| 106 | +by the total requested resources to get the average utilizations. For instance if there is |
| 107 | +a pods with 2 containers: `application` and `log-shipper` requesting `250m` and `250m` of |
| 108 | +CPU resources then the total requested resources of the pod as calculated by the HPA is `500m`. |
| 109 | +If then the first container is currently using `200m` and the second only `50m` then |
| 110 | +the usage of the pod is `250m` which in utilization is _50%_. Although individually |
| 111 | +the utilization of the containers are _80%_ and _20%_. In such a situation the performance |
| 112 | +of the `application` container might be affected significantly. There is no way to specify |
| 113 | +in the HPA to keep the utilization of the first container below a certain threshold. This also |
| 114 | +affects `memory` resource based autocaling scaling. |
| 115 | + |
| 116 | +We propose that the following changes be made to the metric sources to address this problem: |
| 117 | + |
| 118 | +1. A new metric source called `ContainerResourceMetricSource` be introduced with the following |
| 119 | +structure: |
| 120 | + |
| 121 | +```go |
| 122 | +type ContainerResourceMetricSource struct { |
| 123 | + Container string |
| 124 | + Name v1.ResourceName |
| 125 | + Target MetricTarget |
| 126 | +} |
| 127 | +``` |
| 128 | + |
| 129 | +The only new field is `Container` which is the name of the container for which the resource |
| 130 | +usage should be tracked. |
| 131 | + |
| 132 | +2. The `ResourceMetricSource` should be aliased to `PodResourceMetricSource`. It will work |
| 133 | +exactly as the original. The aliasing is done for the sake of consistency. Correspondingly, |
| 134 | +the `type` field for the metric source should be extended to support both `ContainerResource` |
| 135 | +and `PodResource` as values. |
| 136 | + |
| 137 | +### User Stories |
| 138 | + |
| 139 | +#### Multiple containers with different scaling thresholds |
| 140 | + |
| 141 | +Assume the user has a deployment with multiple pods, each of which have multiple containers. A main |
| 142 | +container called `application` and 2 others called `log-shipping` and `authnz-proxy`. Two |
| 143 | +of the containers are critical to provide the application functionality, `application` and |
| 144 | +`authnz-proxy`. The user would like to prevent _OOMKill_ of these containers and also keep |
| 145 | +their CPU utilization low to ensure the highest performance. The other container |
| 146 | +`log-shipping` is less critical and can tolerate failures and restarts. In this case the |
| 147 | +user would create an HPA with the following configuration: |
| 148 | + |
| 149 | +```yaml |
| 150 | +apiVersion: autoscaling/v2beta2 |
| 151 | +kind: HorizontalPodAutoscaler |
| 152 | +metadata: |
| 153 | + name: mission-critical |
| 154 | +spec: |
| 155 | + scaleTargetRef: |
| 156 | + apiVersion: apps/v1 |
| 157 | + kind: Deployment |
| 158 | + name: mission-critical |
| 159 | + minReplicas: 1 |
| 160 | + maxReplicas: 10 |
| 161 | + metrics: |
| 162 | + - type: ContainerResource |
| 163 | + resource: |
| 164 | + name: cpu |
| 165 | + container: application |
| 166 | + target: |
| 167 | + type: Utilization |
| 168 | + averageUtilization: 30 |
| 169 | + - type: ContainerResource |
| 170 | + resource: |
| 171 | + name: memory |
| 172 | + container: application |
| 173 | + target: |
| 174 | + type: Utilization |
| 175 | + averageUtilization: 80 |
| 176 | + - type: ContainerResource |
| 177 | + resource: |
| 178 | + name: cpu |
| 179 | + container: authnz-proxy |
| 180 | + target: |
| 181 | + type: Utilization |
| 182 | + averageUtilization: 30 |
| 183 | + - type: ContainerResource |
| 184 | + resource: |
| 185 | + name: memory |
| 186 | + container: authnz-proxy |
| 187 | + target: |
| 188 | + type: Utilization |
| 189 | + averageUtilization: 80 |
| 190 | + - type: ContainerResource |
| 191 | + resource: |
| 192 | + name: cpu |
| 193 | + container: log-shipping |
| 194 | + target: |
| 195 | + type: Utilization |
| 196 | + averageUtilization: 80 |
| 197 | +``` |
| 198 | +
|
| 199 | +The HPA specifies that the HPA controller should maintain the CPU utilization of the containers |
| 200 | +`application` and `authnz-proxy` at _30%_ and the memory utilization at _80%_. The `log-shipping` |
| 201 | +container is scaled to keep the cpu utilization at _80%_ and is not scaled on memory. |
| 202 | + |
| 203 | +#### Multiple containers but only scaling for one. |
| 204 | +Assume the user has a deployment where the pod spec has multiple containers but scaling should |
| 205 | +be performed based only on the utilization of one of the containers. There could be several reasons |
| 206 | +for such a strategy: Disruptions due to scaling of sidecars may be expensive and should be avoided |
| 207 | +or the resource usage of the sidecars could be erratic because it has a different work characteristics |
| 208 | +to the main container. |
| 209 | + |
| 210 | +In such a case the user creates an HPA as follows: |
| 211 | + |
| 212 | +```yaml |
| 213 | +apiVersion: autoscaling/v2beta2 |
| 214 | +kind: HorizontalPodAutoscaler |
| 215 | +metadata: |
| 216 | + name: mission-critical |
| 217 | +spec: |
| 218 | + scaleTargetRef: |
| 219 | + apiVersion: apps/v1 |
| 220 | + kind: Deployment |
| 221 | + name: mission-critical |
| 222 | + minReplicas: 1 |
| 223 | + maxReplicas: 10 |
| 224 | + metrics: |
| 225 | + - type: ContainerResource |
| 226 | + resource: |
| 227 | + name: cpu |
| 228 | + container: application |
| 229 | + target: |
| 230 | + type: Utilization |
| 231 | + averageUtilization: 30 |
| 232 | +``` |
| 233 | + |
| 234 | +The HPA controller will then completely ignore the resource usage in other containers. |
| 235 | + |
| 236 | +#### Add container metrics to existing pod resource metric. |
| 237 | +A user who is already using an HPA to scale their application can add the container metric source to the HPA |
| 238 | +in addition to the existing pod metric source. If there is a single container in the pod then the behavior |
| 239 | +will be exactly the same as before. If there are multiple containers in the application pods then the deployment |
| 240 | +might scale out more than before. This happens when the resource usage of the specified container is more |
| 241 | +than the blended usage as calculated by the pod metric source. If however in the unlikely case, the usage of |
| 242 | +all the containers in the pod change in tandem by the same amount then the behavior will remain as before. |
| 243 | + |
| 244 | +For example consider the HPA object which targets a _Deployement_ with pods that have two containers `application` |
| 245 | +and `log-shipper`: |
| 246 | + |
| 247 | +```yaml |
| 248 | +
|
| 249 | +apiVersion: autoscaling/v2beta2 |
| 250 | +kind: HorizontalPodAutoscaler |
| 251 | +metadata: |
| 252 | + name: mission-critical |
| 253 | +spec: |
| 254 | + scaleTargetRef: |
| 255 | + apiVersion: apps/v1 |
| 256 | + kind: Deployment |
| 257 | + name: mission-critical |
| 258 | + minReplicas: 1 |
| 259 | + maxReplicas: 10 |
| 260 | + metrics: |
| 261 | + - type: ContainerResource |
| 262 | + resource: |
| 263 | + name: cpu |
| 264 | + container: application |
| 265 | + target: |
| 266 | + type: Utilization |
| 267 | + averageUtilization: 50 |
| 268 | + - type: PodResource |
| 269 | + resource: |
| 270 | + name: cpu |
| 271 | + target: |
| 272 | + type: Utilization |
| 273 | + averageUtilization: 50 |
| 274 | +``` |
| 275 | + |
| 276 | +If the resource usage of the `application` container increases then the target would be scaled out even if |
| 277 | +the usage of the `log-shipper` container does not increase much. If the resource usage of `log-shipper` container |
| 278 | +increases then the deployment would only be scaled out if the combined resource usage of both containers increases |
| 279 | +above the target. |
| 280 | + |
| 281 | + |
| 282 | +### Risks and Mitigations |
| 283 | + |
| 284 | +In order to keep backward compatibility with the existing API both `ResourceMetricSource` and |
| 285 | +`PodResourceMetricSource` will be supported. Existing HPAs will continue functioning like before. |
| 286 | +There will be no deprecation warning or internal migrations from `ResourceMetricSource` to |
| 287 | +`PodResourceMetricSource`. |
| 288 | + |
| 289 | + |
| 290 | +## Design Details |
| 291 | + |
| 292 | +### Test Plan |
| 293 | +TBD |
| 294 | + |
| 295 | +### Graduation Criteria |
| 296 | + |
| 297 | +Since the feature is being added to the HPA version `v2beta2` there is no further graduation |
| 298 | +criteria required because it will graduate when the original API graduates to `stable` |
| 299 | + |
| 300 | +### Upgrade/Downgrade Strategy |
| 301 | + |
| 302 | +For cluster upgrades the HPAs from the previous version will continue working as before. There |
| 303 | +is no change in behavior or flags which have to be enabled or disabled. |
| 304 | + |
| 305 | +For clusters which have HPAs which use `ContainerResourceMetricSource` or `PodResourceMetricSource` |
| 306 | +a downgrade is possible after HPAs which use this new source have been modified to use |
| 307 | +`ResourceMetricSource` instead. |
| 308 | + |
| 309 | +## Implementation History |
0 commit comments