Skip to content

Commit 47003eb

Browse files
authored
Merge pull request #53354 from lmktfy/20251120_update_vertical_pod_autoscaling_docs
Update VerticalPodAutoscaler concept page
2 parents 8897b17 + 05e7239 commit 47003eb

File tree

1 file changed

+59
-27
lines changed

1 file changed

+59
-27
lines changed

content/en/docs/concepts/workloads/autoscaling/vertical-pod-autoscale.md

Lines changed: 59 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,15 @@ math: true
1414

1515
<!-- overview -->
1616

17-
In Kubernetes, a _VerticalPodAutoscaler_ automatically updates a workload resource (such as
17+
In Kubernetes, a _VerticalPodAutoscaler_ automatically updates a workload management {{< glossary_tooltip text="resource" term_id="api-resource" >}} (such as
1818
a {{< glossary_tooltip text="Deployment" term_id="deployment" >}} or
1919
{{< glossary_tooltip text="StatefulSet" term_id="statefulset" >}}), with the
20-
aim of automatically adjusting resource requests and limits to match actual usage.
20+
aim of automatically adjusting infrastructure {{< glossary_tooltip text="resource" term_id="infrastructure-resource" >}}
21+
[requests and limits](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) to match actual usage.
2122

2223
Vertical scaling means that the response to increased resource demand is to assign more resources (for example: memory or CPU)
2324
to the {{< glossary_tooltip text="Pods" term_id="pod" >}} that are already running for the workload.
24-
This is also known as "rightsizing" or "autopilot".
25+
This is also known as _rightsizing_, or sometimes _autopilot_.
2526
This is different from horizontal scaling, which for Kubernetes would mean deploying more Pods to distribute the load.
2627

2728
If the resource usage decreases, and the Pod resource requests are above optimal levels,
@@ -53,8 +54,8 @@ graph BT
5354
admission[VPA Admission Controller]
5455

5556
vpa_cr[VerticalPodAutoscaler CRD]
56-
recommender[VPA Recommender]
57-
updater[VPA Updater]
57+
recommender[VPA recommender]
58+
updater[VPA updater]
5859

5960
metrics --> recommender
6061
recommender -->|Stores Recommendations| vpa_cr
@@ -89,9 +90,10 @@ graph BT
8990
Figure 1. VerticalPodAutoscaler controls the resource requests and limits of Pods in a Deployment
9091

9192
Kubernetes implements vertical pod autoscaling through multiple cooperating components that run intermittently (it is not a continuous process). The VPA consists of three main components:
92-
The Recommender, which analyzes resource usage and provides recommendations.
93-
The Updater, which updates Pod resource requests either by evicting Pods or modifying them in place.
94-
And the Admission Controller, which applies recommendations to new or recreated Pods.
93+
94+
* The _recommender*, which analyzes resource usage and provides recommendations.
95+
* The _updater_, that Pod resource requests either by evicting Pods or modifying them in place.
96+
* And the VPA _admission controller_ webhook, which applies resource recommendations to new or recreated Pods.
9597

9698
Once during each period, the Recommender queries the resource utilization for Pods targeted by each VerticalPodAutoscaler definition. The Recommender finds the target resource defined by the `targetRef`, then selects the pods based on the target resource's `.spec.selector` labels, and obtains the metrics from the resource metrics API to analyze actual CPU and memory consumption.
9799

@@ -108,23 +110,31 @@ Based on this analysis, the Recommender calculates three types of recommendation
108110
These recommendations are stored in the VerticalPodAutoscaler resource's `.status.recommendation` field.
109111

110112

111-
The Updater component monitors the VerticalPodAutoscaler resources and compares current Pod resource requests with the recommendations. When the difference exceeds configured thresholds and the update policy allows it, the Updater can either:
113+
The _updater_ component monitors the VerticalPodAutoscaler resources and compares current Pod resource requests with the recommendations. When the difference exceeds configured thresholds and the update policy allows it, the updater can either:
114+
112115
- Evict Pods, triggering their recreation with new resource requests (traditional approach)
113116
- Update Pod resources in place without eviction, when the cluster supports in-place Pod resource updates
114117

115-
The chosen method depends on the configured update mode, cluster capabilities, and the type of resource change needed. In-place updates, when available, avoid Pod disruption but may have limitations on which resources can be modified. The Updater respects PodDisruptionBudgets to minimize service impact.
118+
The chosen method depends on the configured update mode, cluster capabilities, and the type of resource change needed. In-place updates, when available, avoid Pod disruption but may have limitations on which resources can be modified. The updater respects PodDisruptionBudgets to minimize service impact.
116119

117-
The Admission Controller operates as a mutating webhook that intercepts Pod creation requests. It checks if the Pod is targeted by a VerticalPodAutoscaler and, if so, applies the recommended resource requests and limits before the Pod is created. This ensures new Pods start with appropriately sized resource allocations, whether they're created during initial deployment, after an eviction by the Updater, or due to scaling operations.
120+
The _admission controller_ operates as a mutating webhook that intercepts Pod creation requests. It
121+
checks if the Pod is targeted by a VerticalPodAutoscaler and, if so, applies the recommended
122+
resource requests and limits before the Pod is created. This ensures new Pods start with
123+
appropriately sized resource allocations, whether they're created during initial deployment,
124+
after an eviction by the updater, or due to scaling operations.
118125

119-
The VerticalPodAutoscaler requires the Metrics Server to be installed in the cluster. The VPA components fetch metrics from the `metrics.k8s.io` API. The Metrics Server needs to be launched separately as it is not deployed by default in most clusters. For more information about resource metrics, see [Metrics Server](/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server).
126+
The VerticalPodAutoscaler requires a metrics source, such as Kubernetes' Metrics Server {{< glossary_tooltip text="add-on" term_id="addons" >}},
127+
to be installed in the cluster.
128+
The VPA components fetch metrics from the `metrics.k8s.io` API. The Metrics Server needs to be launched separately as it is not deployed by default in most clusters. For more information about resource metrics, see [Metrics Server](/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#metrics-server).
120129

121130
## Update modes
122131

123-
The VerticalPodAutoscaler supports different update modes that control how and when
132+
A VerticalPodAutoscaler supports different _update modes_ that control how and when
124133
resource recommendations are applied to your Pods. You configure the update mode using
125134
the `updateMode` field in the VPA spec under `updatePolicy`:
126135

127136
```yaml
137+
---
128138
apiVersion: autoscaling.k8s.io/v1
129139
kind: VerticalPodAutoscaler
130140
metadata:
@@ -138,24 +148,31 @@ spec:
138148
updateMode: "Recreate" # Off, Initial, Recreate, InPlaceOrRecreate
139149
```
140150
141-
### Off
151+
### Off {#updateMode-Off}
142152
143-
In `Off` mode, the VPA Recommender still analyzes resource usage and generates recommendations, but these recommendations are not automatically applied to Pods. The recommendations are only stored in the VPA object's status field.
153+
In the _Off_ update mode, the VPA recommender still analyzes resource usage and generates
154+
recommendations, but these recommendations are not automatically applied to Pods.
155+
The recommendations are only stored in the VPA object's `.status` field.
144156

145-
### Initial
157+
You can use a tool such as `kubectl` to view the `.status` and the recommendations in it.
146158

147-
In `Initial` mode, VPA only sets resource requests when Pods are first created. It does not update resources for already running Pods, even if recommendations change over time.
159+
### Initial {#updateMode-Initial}
148160

149-
### Recreate
161+
In _Initial_ mode, VPA only sets resource requests when Pods are first created. It does not update resources for already running Pods, even if recommendations change over time.
150162

151-
In `Recreate` mode, VPA actively manages Pod resources by evicting Pods when their current resource requests differ significantly from recommendations. When a Pod is evicted, the workload controller (Deployment, StatefulSet, etc.) creates a replacement Pod, and the VPA Admission Controller applies the updated resource requests to the new Pod.
163+
### Recreate {#updateMode-Recreate}
152164

153-
### InPlaceOrRecreate
165+
In _Recreate_ mode, VPA actively manages Pod resources by evicting Pods when their current
166+
resource requests differ significantly from recommendations. When a Pod is evicted, the workload
167+
controller (managing a Deployment, StatefulSet, etc) creates a replacement Pod, and the VPA admission
168+
controller applies the updated resource requests to the new Pod.
169+
170+
### InPlaceOrRecreate {#updateMode-InPlaceOrRecreate}
154171

155172
In `InPlaceOrRecreate` mode, VPA attempts to update Pod resource requests and limits without restarting the Pod when possible. However, if in-place updates cannot be performed for a particular resource change, VPA falls back to evicting the Pod
156173
(similar to `Recreate` mode) and allowing the workload controller to create a replacement Pod with updated resources.
157174

158-
### Auto
175+
### Auto (deprecated) {#updateMode-Auto}
159176

160177
{{< note >}}
161178
The `Auto` update mode is **deprecated since VPA version 1.4.0**. Use `Recreate` for
@@ -200,15 +217,30 @@ spec:
200217

201218
#### minAllowed and maxAllowed
202219

203-
These fields set boundaries for VPA recommendations. The VPA will never recommend resources below minAllowed or above maxAllowed, even if the actual usage data suggests different values.
220+
These fields set boundaries for VPA recommendations.
221+
The VPA will never recommend resources below `minAllowed` or above `maxAllowed`, even if the actual usage data suggests different values.
204222

205223
#### controlledResources
206224

207-
The controlledResources field specifies which resource types VPA should manage for a container. If not specified, VPA manages both CPU and memory by default. You can limit VPA to manage only specific resources.
208-
Valid resource names include cpu and memory.
225+
The `controlledResources` field specifies which resource types VPA should manage for a container in a Pod.
226+
If not specified, VPA manages both CPU and memory by default. You can restrict VPA to manage only specific resources.
227+
Valid resource names include `cpu` and `memory`.
209228

210229
### controlledValues
211230

212-
The controlledValues field determines whether VPA controls resource requests, limits, or both:
213-
- `RequestsAndLimits` (default): VPA sets both requests and limits. The limit is scaled proportionally to the request.
214-
- `RequestsOnly`: VPA only sets requests, leaving limits unchanged. Limits are respected and can still trigger throttling or OOMKills if usage exceeds them.
231+
The `controlledValues` field determines whether VPA controls resource requests, limits, or both:
232+
233+
RequestsAndLimits
234+
: VPA sets both requests and limits. The limit is scaled proportionally to the request. This is the default mode.
235+
236+
RequestsOnly
237+
: VPA only sets requests, leaving limits unchanged. Limits are respected and can still trigger throttling or out-of-memory kills if usage exceeds them.
238+
239+
See [requests and limits](/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) to learn more about those two concepts.
240+
241+
## {{% heading "whatsnext" %}}
242+
243+
If you configure autoscaling in your cluster, you may also want to consider using
244+
[node autoscaling](/docs/concepts/cluster-administration/node-autoscaling/)
245+
to ensure you are running the right number of nodes.
246+
You can also read more about [_horizontal_ Pod autoscaling](/docs/tasks/run-application/horizontal-pod-autoscale/).

0 commit comments

Comments
 (0)