Skip to content

Commit 3a82e57

Browse files
authored
Merge pull request #53345 from omerap12/vpa-doc
Introduce concept page for Vertical Pod Autoscaling (VPA).
2 parents 1194953 + 8815893 commit 3a82e57

File tree

1 file changed

+214
-0
lines changed

1 file changed

+214
-0
lines changed
Lines changed: 214 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,214 @@
1+
---
2+
reviewers:
3+
- adrianmoisey
4+
- omerap12
5+
title: Vertical Pod Autoscaling
6+
feature:
7+
title: Vertical scaling
8+
description: >
9+
Automatically adjust resource requests and limits based on actual usage patterns.
10+
content_type: concept
11+
weight: 90
12+
math: true
13+
---
14+
15+
<!-- overview -->
16+
17+
In Kubernetes, a _VerticalPodAutoscaler_ automatically updates a workload resource (such as
18+
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
19+
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
20+
aim of automatically adjusting resource requests and limits to match actual usage.
21+
22+
Vertical scaling means that the response to increased resource demand is to assign more resources (for example: memory or CPU)
23+
to the {{< glossary_tooltip text="Pods" term_id="pod" >}} that are already running for the workload.
24+
This is also known as "rightsizing" or "autopilot".
25+
This is different from horizontal scaling, which for Kubernetes would mean deploying more Pods to distribute the load.
26+
27+
If the resource usage decreases, and the Pod resource requests are above optimal levels,
28+
the VerticalPodAutoscaler instructs the workload resource (the Deployment, StatefulSet, or other similar resource)
29+
to adjust resource requests back down, preventing resource waste.
30+
31+
The VerticalPodAutoscaler is implemented as a Kubernetes API resource and a
32+
{{< glossary_tooltip text="controller" term_id="controller" >}}.
33+
The resource determines the behavior of the controller.
34+
The vertical pod autoscaling controller, running within the Kubernetes data plane,
35+
periodically adjusts the resource requests and limits of its target (for example, a Deployment)
36+
based on analysis of historical resource utilization,
37+
the amount of resources available in the cluster, and real-time events such as out-of-memory (OOM) conditions.
38+
39+
<!-- body -->
40+
41+
## API object
42+
43+
The VerticalPodAutoscaler is defined as a {{< glossary_tooltip text="Custom Resource Definition" term_id="customresourcedefinition" >}} (CRD) in Kubernetes. Unlike HorizontalPodAutoscaler, which is part of the core Kubernetes API, VPA must be installed separately in your cluster.
44+
45+
The current stable API version is `autoscaling.k8s.io/v1`. More details about the VPA installation and API can be found in the [VPA GitHub repository](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler).
46+
47+
## How does a VerticalPodAutoscaler work?
48+
49+
{{< mermaid >}}
50+
graph BT
51+
metrics[Metrics Server]
52+
api[API Server]
53+
admission[VPA Admission Controller]
54+
55+
vpa_cr[VerticalPodAutoscaler CRD]
56+
recommender[VPA Recommender]
57+
updater[VPA Updater]
58+
59+
metrics --> recommender
60+
recommender -->|Stores Recommendations| vpa_cr
61+
62+
subgraph Application Workload
63+
controller[Deployment / RC / StatefulSet]
64+
pod[Pod / Container]
65+
end
66+
67+
vpa_cr -->|Checks for changes| updater
68+
updater -->|Evicts Pod or Updates in place| controller
69+
controller -->|Requests new Pod| api
70+
71+
api -->|New Pod Creation| admission
72+
admission -->|Retrieves latest recommendation| vpa_cr
73+
admission -->|Injects new resource values| api
74+
75+
api -->|Creates Pod| controller
76+
controller -->|New Pod with Optimal Resources| pod
77+
78+
classDef vpa fill:#9FC5E8,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;
79+
classDef crd fill:#D5A6BD,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;
80+
classDef metrics fill:#FFD966,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;
81+
classDef app fill:#B6D7A8,stroke:#1E1E1D,stroke-width:1px,color:#1E1E1D;
82+
83+
class recommender,updater,admission vpa;
84+
class vpa_cr crd;
85+
class metrics metrics;
86+
class controller,pod app;
87+
{{< /mermaid >}}
88+
89+
Figure 1. VerticalPodAutoscaler controls the resource requests and limits of Pods in a Deployment
90+
91+
Kubernetes implements vertical pod autoscaling through multiple cooperating components that run intermittently (it is not a continuous process). The VPA consists of three main components:
92+
The Recommender, which analyzes resource usage and provides recommendations.
93+
The Updater, which updates Pod resource requests either by evicting Pods or modifying them in place.
94+
And the Admission Controller, which applies recommendations to new or recreated Pods.
95+
96+
Once during each period, the Recommender queries the resource utilization for Pods targeted by each VerticalPodAutoscaler definition. The Recommender finds the target resource defined by the `targetRef`, then selects the pods based on the target resource's `.spec.selector` labels, and obtains the metrics from the resource metrics API to analyze actual CPU and memory consumption.
97+
98+
The Recommender analyzes both current and historical resource usage data (CPU and memory) for each Pod targeted by the VerticalPodAutoscaler. It examines:
99+
- Historical consumption patterns over time to identify trends
100+
- Peak usage and variance to ensure sufficient headroom
101+
- Current resource requests compared to actual usage
102+
- Out-of-memory (OOM) events and other resource-related incidents
103+
104+
Based on this analysis, the Recommender calculates three types of recommendations:
105+
- Target recommendation (optimal resources for typical usage)
106+
- Lower bound (minimum viable resources)
107+
- Upper bound (maximum reasonable resources).
108+
These recommendations are stored in the VerticalPodAutoscaler resource's `.status.recommendation` field.
109+
110+
111+
The Updater component monitors the VerticalPodAutoscaler resources and compares current Pod resource requests with the recommendations. When the difference exceeds configured thresholds and the update policy allows it, the Updater can either:
112+
- Evict Pods, triggering their recreation with new resource requests (traditional approach)
113+
- Update Pod resources in place without eviction, when the cluster supports in-place Pod resource updates
114+
115+
The chosen method depends on the configured update mode, cluster capabilities, and the type of resource change needed. In-place updates, when available, avoid Pod disruption but may have limitations on which resources can be modified. The Updater respects PodDisruptionBudgets to minimize service impact.
116+
117+
The Admission Controller operates as a mutating webhook that intercepts Pod creation requests. It checks if the Pod is targeted by a VerticalPodAutoscaler and, if so, applies the recommended resource requests and limits before the Pod is created. This ensures new Pods start with appropriately sized resource allocations, whether they're created during initial deployment, after an eviction by the Updater, or due to scaling operations.
118+
119+
The VerticalPodAutoscaler requires the Metrics Server to be installed in the cluster. The VPA components fetch metrics from the `metrics.k8s.io` API. The Metrics Server needs to be launched separately as it is not deployed by default in most clusters. For more information about resource metrics, see [Metrics Server](/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server).
120+
121+
## Update modes
122+
123+
The VerticalPodAutoscaler supports different update modes that control how and when
124+
resource recommendations are applied to your Pods. You configure the update mode using
125+
the `updateMode` field in the VPA spec under `updatePolicy`:
126+
127+
```yaml
128+
apiVersion: autoscaling.k8s.io/v1
129+
kind: VerticalPodAutoscaler
130+
metadata:
131+
name: my-app-vpa
132+
spec:
133+
targetRef:
134+
apiVersion: "apps/v1"
135+
kind: Deployment
136+
name: my-app
137+
updatePolicy:
138+
updateMode: "Recreate" # Off, Initial, Recreate, InPlaceOrRecreate
139+
```
140+
141+
### Off
142+
143+
In `Off` mode, the VPA Recommender still analyzes resource usage and generates recommendations, but these recommendations are not automatically applied to Pods. The recommendations are only stored in the VPA object's status field.
144+
145+
### Initial
146+
147+
In `Initial` mode, VPA only sets resource requests when Pods are first created. It does not update resources for already running Pods, even if recommendations change over time.
148+
149+
### Recreate
150+
151+
In `Recreate` mode, VPA actively manages Pod resources by evicting Pods when their current resource requests differ significantly from recommendations. When a Pod is evicted, the workload controller (Deployment, StatefulSet, etc.) creates a replacement Pod, and the VPA Admission Controller applies the updated resource requests to the new Pod.
152+
153+
### InPlaceOrRecreate
154+
155+
In `InPlaceOrRecreate` mode, VPA attempts to update Pod resource requests and limits without restarting the Pod when possible. However, if in-place updates cannot be performed for a particular resource change, VPA falls back to evicting the Pod
156+
(similar to `Recreate` mode) and allowing the workload controller to create a replacement Pod with updated resources.
157+
158+
### Auto
159+
160+
{{< note >}}
161+
The `Auto` update mode is **deprecated since VPA version 1.4.0**. Use `Recreate` for
162+
eviction-based updates, or `InPlaceOrRecreate` for in-place updates with eviction fallback.
163+
{{< /note >}}
164+
165+
`Auto` mode is currently an alias for `Recreate` mode and behaves identically. It was introduced to allow for future expansion of automatic update strategies.
166+
167+
## Resource policies
168+
169+
Resource policies allow you to fine-tune how the VerticalPodAutoscaler generates recommendations and applies updates.
170+
You can set boundaries for resource recommendations, specify which resources to manage, and configure different policies for individual containers within a Pod.
171+
172+
You define resource policies in the `resourcePolicy` field of the VPA spec:
173+
174+
```yaml
175+
apiVersion: autoscaling.k8s.io/v1
176+
kind: VerticalPodAutoscaler
177+
metadata:
178+
name: my-app-vpa
179+
spec:
180+
targetRef:
181+
apiVersion: "apps/v1"
182+
kind: Deployment
183+
name: my-app
184+
updatePolicy:
185+
updateMode: "Recreate"
186+
resourcePolicy:
187+
containerPolicies:
188+
- containerName: "application"
189+
minAllowed:
190+
cpu: 100m
191+
memory: 128Mi
192+
maxAllowed:
193+
cpu: 2
194+
memory: 2Gi
195+
controlledResources:
196+
- cpu
197+
- memory
198+
controlledValues: RequestsAndLimits
199+
```
200+
201+
#### minAllowed and maxAllowed
202+
203+
These fields set boundaries for VPA recommendations. The VPA will never recommend resources below minAllowed or above maxAllowed, even if the actual usage data suggests different values.
204+
205+
#### controlledResources
206+
207+
The controlledResources field specifies which resource types VPA should manage for a container. If not specified, VPA manages both CPU and memory by default. You can limit VPA to manage only specific resources.
208+
Valid resource names include cpu and memory.
209+
210+
### controlledValues
211+
212+
The controlledValues field determines whether VPA controls resource requests, limits, or both:
213+
- `RequestsAndLimits` (default): VPA sets both requests and limits. The limit is scaled proportionally to the request.
214+
- `RequestsOnly`: VPA only sets requests, leaving limits unchanged. Limits are respected and can still trigger throttling or OOMKills if usage exceeds them.

0 commit comments

Comments
 (0)