|
| 1 | +--- |
| 2 | +title: Autoscaling Workloads |
| 3 | +description: >- |
| 4 | + With autoscaling, you can automatically update your workloads in one way or another. This allows your cluster to react to changes in resource demand more elastically and efficiently. |
| 5 | +content_type: concept |
| 6 | +weight: 40 |
| 7 | +--- |
| 8 | + |
| 9 | +<!-- overview --> |
| 10 | + |
| 11 | +In Kubernetes, you can _scale_ a workload depending on the current demand of resources. |
| 12 | +This allows your cluster to react to changes in resource demand more elastically and efficiently. |
| 13 | + |
| 14 | +When you scale a workload, you can either increase or decrease the number of replicas managed by |
| 15 | +the workload, or adjust the resources available to the replicas in-place. |
| 16 | + |
| 17 | +The first approach is referred to as _horizontal scaling_, while the second is referred to as |
| 18 | +_vertical scaling_. |
| 19 | + |
| 20 | +There are manual and automatic ways to scale your workloads, depending on your use case. |
| 21 | + |
| 22 | +<!-- body --> |
| 23 | + |
| 24 | +## Scaling workloads manually |
| 25 | + |
| 26 | +Kubernetes supports _manual scaling_ of workloads. Horizontal scaling can be done |
| 27 | +using the `kubectl` CLI. |
| 28 | +For vertical scaling, you need to _patch_ the resource definition of your workload. |
| 29 | + |
| 30 | +See below for examples of both strategies. |
| 31 | + |
| 32 | +- **Horizontal scaling**: [Running multiple instances of your app](/docs/tutorials/kubernetes-basics/scale/scale-intro/) |
| 33 | +- **Vertical scaling**: [Resizing CPU and memory resources assigned to containers](/docs/tasks/configure-pod-container/resize-container-resources) |
| 34 | + |
| 35 | +## Scaling workloads automatically |
| 36 | + |
| 37 | +Kubernetes also supports _automatic scaling_ of workloads, which is the focus of this page. |
| 38 | + |
| 39 | +The concept of _Autoscaling_ in Kubernetes refers to the ability to automatically update an |
| 40 | +object that manages a set of Pods (for example a |
| 41 | +{{< glossary_tooltip text="Deployment" term_id="deployment" >}}. |
| 42 | + |
| 43 | +### Scaling workloads horizontally |
| 44 | + |
| 45 | +In Kubernetes, you can automatically scale a workload horizontally using a _HorizontalPodAutoscaler_ (HPA). |
| 46 | + |
| 47 | +It is implemented as a Kubernetes API resource and a {{< glossary_tooltip text="controller" term_id="controller" >}} |
| 48 | +and periodically adjusts the number of {{< glossary_tooltip text="replicas" term_id="replica" >}} |
| 49 | +in a workload to match observed resource utilization such as CPU or memory usage. |
| 50 | + |
| 51 | +There is a [walkthrough tutorial](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough) of configuring a HorizontalPodAutoscaler for a Deployment. |
| 52 | + |
| 53 | +### Scaling workloads vertically |
| 54 | + |
| 55 | +{{< feature-state for_k8s_version="v1.25" state="stable" >}} |
| 56 | + |
| 57 | +You can automatically scale a workload vertically using a _VerticalPodAutoscaler_ (VPA). |
| 58 | +Different to the HPA, the VPA doesn't come with Kubernetes by default, but is a separate project |
| 59 | +that can be found [on GitHub](https://github.com/kubernetes/autoscaler/tree/9f87b78df0f1d6e142234bb32e8acbd71295585a/vertical-pod-autoscaler). |
| 60 | + |
| 61 | +Once installed, it allows you to create {{< glossary_tooltip text="CustomResourceDefinitions" term_id="customresourcedefinition" >}} |
| 62 | +(CRDs) for your workloads which define _how_ and _when_ to scale the resources of the managed replicas. |
| 63 | + |
| 64 | +{{< note >}} |
| 65 | +You will need to have the [Metrics Server](https://github.com/kubernetes-sigs/metrics-server) |
| 66 | +installed to your cluster for the HPA to work. |
| 67 | +{{< /note >}} |
| 68 | + |
| 69 | +At the moment, the VPA can operate in four different modes: |
| 70 | + |
| 71 | +{{< table caption="Different modes of the VPA" >}} |
| 72 | +Mode | Description |
| 73 | +:----|:----------- |
| 74 | +`Auto` | Currently `Recreate`, might change to in-place updates in the future |
| 75 | +`Recreate` | The VPA assigns resource requests on pod creation as well as updates them on existing pods by evicting them when the requested resources differ significantly from the new recommendation |
| 76 | +`Initial` | The VPA only assigns resource requests on pod creation and never changes them later. |
| 77 | +`Off` | The VPA does not automatically change the resource requirements of the pods. The recommendations are calculated and can be inspected in the VPA object. |
| 78 | +{{< /table >}} |
| 79 | + |
| 80 | +#### Requirements for in-place resizing |
| 81 | + |
| 82 | +{{< feature-state for_k8s_version="v1.27" state="alpha" >}} |
| 83 | + |
| 84 | +Resizing a workload in-place **without** restarting the {{< glossary_tooltip text="Pods" term_id="pod" >}} |
| 85 | +or its {{< glossary_tooltip text="Containers" term_id="container" >}} requires Kubernetes version 1.27 or later.<br /> |
| 86 | +Additionally, the `InPlaceVerticalScaling` feature gate needs to be enabled. |
| 87 | + |
| 88 | +{{< feature-gate-description name="InPlacePodVerticalScaling" >}} |
| 89 | + |
| 90 | +### Autoscaling based on cluster size |
| 91 | + |
| 92 | +For workloads that need to be scaled based on the size of the cluster (for example |
| 93 | +`cluster-dns` or other system components), you can use the |
| 94 | +[_Cluster Proportional Autoscaler_](https://github.com/kubernetes-sigs/cluster-proportional-autoscaler).<br /> |
| 95 | +Just like the VPA, it is not part of the Kubernetes core, but hosted as its |
| 96 | +own project on GitHub. |
| 97 | + |
| 98 | +The Cluster Proportional Autoscaler watches the number of schedulable {{< glossary_tooltip text="nodes" term_id="node" >}} |
| 99 | +and cores and scales the number of replicas of the target workload accordingly. |
| 100 | + |
| 101 | +If the number of replicas should stay the same, you can scale your workloads vertically according to the cluster size using |
| 102 | +the [_Cluster Proportional Vertical Autoscaler_](https://github.com/kubernetes-sigs/cluster-proportional-vertical-autoscaler). |
| 103 | +The project is **currently in beta** and can be found on GitHub. |
| 104 | + |
| 105 | +While the Cluster Proportional Autoscaler scales the number of replicas of a workload, the Cluster Proportional Vertical Autoscaler |
| 106 | +adjusts the resource requests for a workload (for example a Deployment or DaemonSet) based on the number of nodes and/or cores |
| 107 | +in the cluster. |
| 108 | + |
| 109 | +### Event driven Autoscaling |
| 110 | + |
| 111 | +It is also possible to scale workloads based on events, for example using the |
| 112 | +[_Kubernetes Event Driven Autoscaler_ (**KEDA**)](https://keda.sh/). |
| 113 | + |
| 114 | +KEDA is a CNCF graduated enabling you to scale your workloads based on the number |
| 115 | +of events to be processed, for example the amount of messages in a queue. There exists |
| 116 | +a wide range of adapters for different event sources to choose from. |
| 117 | + |
| 118 | +### Autoscaling based on schedules |
| 119 | + |
| 120 | +Another strategy for scaling your workloads is to **schedule** the scaling operations, for example in order to |
| 121 | +reduce resource consumption during off-peak hours. |
| 122 | + |
| 123 | +Similar to event driven autoscaling, such behavior can be achieved using KEDA in conjunction with |
| 124 | +its [`Cron` scaler](https://keda.sh/docs/2.13/scalers/cron/). The `Cron` scaler allows you to define schedules |
| 125 | +(and time zones) for scaling your workloads in or out. |
| 126 | + |
| 127 | +## Scaling cluster infrastructure |
| 128 | + |
| 129 | +If scaling workloads isn't enough to meet your needs, you can also scale your cluster infrastructure itself. |
| 130 | + |
| 131 | +Scaling the cluster infrastructure normally means adding or removing {{< glossary_tooltip text="nodes" term_id="node" >}}. |
| 132 | +This can be done using one of two available autoscalers: |
| 133 | + |
| 134 | +- [**Cluster Autoscaler**](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler) |
| 135 | +- [**Karpenter**](https://github.com/kubernetes-sigs/karpenter?tab=readme-ov-file) |
| 136 | + |
| 137 | +Both scalers work by watching for pods marked as _unschedulable_ or _underutilized_ nodes and then adding or |
| 138 | +removing nodes as needed. |
| 139 | + |
| 140 | +## {{% heading "whatsnext" %}} |
| 141 | + |
| 142 | +- Learn more about scaling horizontally |
| 143 | + - [Scale a StatefulSet](/docs/tasks/run-application/scale-stateful-set/) |
| 144 | + - [HorizontalPodAutoscaler Walkthrough](/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/) |
| 145 | +- [Resize Container Resources In-Place](/docs/tasks/configure-pod-container/resize-container-resources/) |
| 146 | +- [Autoscale the DNS Service in a Cluster](/docs/tasks/administer-cluster/dns-horizontal-autoscaling/) |
0 commit comments