Skip to content

Commit c55f522

Browse files
committed
Add initial AEP for supporting custom request-to-limit ratio at VPA object level
1 parent 2e528f9 commit c55f522

File tree

1 file changed

+216
-0
lines changed
  • vertical-pod-autoscaler/enhancements/support-custom-request-to-limit-ratio

1 file changed

+216
-0
lines changed
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
# AEP-XXX: Add support for setting a custom request-to-limit ratio at the VPA object level
2+
3+
<!-- toc -->
4+
- [Summary](#summary)
5+
- [Goals](#goals)
6+
- [Non-Goals](#non-goals)
7+
- [Proposal](#proposal)
8+
- [Design Details](#design-details)
9+
- [API Changes](#api-changes)
10+
- [Behaviour](#behavior)
11+
- [Current behaviour of VPA 1.4.2](#current-behaviour-of-vpa-142)
12+
- [Proposed feature behavior](#proposed-feature-behavior)
13+
- [Validation](#validation)
14+
- [Static Validation via CRD Rules](#static-validation-via-crd-rules)
15+
- [Dynamic Validation via Admission Controller](#dynamic-validation-via-admission-controller)
16+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
17+
- [Enabling or Disabling the Feature in a Live Cluster](#enabling-or-disabling-the-feature-in-a-live-cluster)
18+
- [When Enabled](#when-enabled)
19+
- [When Disabled](#when-disabled)
20+
- [Kubernetes Version Compatibility](#kubernetes-version-compatibility)
21+
- [Test Plan](#test-plan)
22+
- [Examples](#examples)
23+
- [Implementation History](#implementation-history)
24+
<!-- /toc -->
25+
26+
## Summary
27+
28+
Currently, when VPA is configured to set both requests and limits automatically (i.e., `controlledValues` is set to `RequestsAndLimits` in the VerticalPodAutoscaler CRD), the limit is adjusted proportionally based on the initial request-to-limit ratio defined in workload API objects such as Deployments or StatefulSets - [Ref](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/docs/examples.md#keeping-limit-proportional-to-request).
29+
30+
If the request-to-limit ratio needs to be updated (for example, because the application's resource usage has changed), users must modify the `resources.requests` or `resources.limits` fields in the workload's API object. Since these fields are immutable, this change results in terminating and recreating the existing Pods.
31+
32+
This proposal introduces a new mechanism that allows VPA users to adjust the request-to-limit ratio directly at the VPA CRD level for an already running workload. This avoids the need to manually update the workload's resource requests and limits, and prevents unnecessary Pod restarts.
33+
34+
The feature is gated by a new alpha feature flag, `RequestToLimitRatio`, which is disabled by default.
35+
36+
## Goals
37+
38+
* Provide a feature gate to enable or disable the feature (`RequestToLimitRatio`).
39+
* Allow VPA to update the request-to-limit ratio of a Pod's containers during Pod recreation or in-place updates.
40+
* Introduce a new `RequestToLimitRatio` block that enables users to adjust the request-to-limit ratio in the following ways:
41+
* **Factor**: Multiplies the recommended request by a specified value, and the result is set as the new limit.
42+
* Example: if `factor` is set to `2`, the limit will be set to twice the recommended request.
43+
* **Quantity**: Adds a buffer on top of the resource request. This can be expressed either:
44+
* As a **percentage** (`QuantityPercentage`), or
45+
* As an **absolute value with units** (`QuantityValue`).
46+
47+
## Non-Goals
48+
49+
* This proposal does not change the core VPA algorithm or its decision-making process for when to apply the recommended values or set limits proportionally.
50+
51+
## Proposal
52+
53+
* Extend [`ContainerResourcePolicy`](https://github.com/kubernetes/autoscaler/blob/vertical-pod-autoscaler-1.4.2/vertical-pod-autoscaler/pkg/apis/autoscaling.k8s.io/v1/types.go#L197) to allow updating the request-to-limit ratio for individual containers in a Pod targeted by a VPA object. Furthermore, to enable updating the ratio globally, a single wildcard entry with `containerName = '*'` can be used. This applies to all containers in the targeted Pod that do not have individual policies.
54+
55+
56+
Some examples of the VPA CRD using the new `RequestToLimitRatio` field are provided in a later [section](#examples).
57+
58+
## Design Details
59+
60+
### API Changes
61+
62+
A new `RequestToLimitRatio` field will be added, with the following sub-fields:
63+
64+
* `RequestToLimitRatio.CPU.Type` or `RequestToLimitRatio.Memory.Type` (type: `string`, required): Specifies how to apply limits proportionally to the requests. `Type` can have the following values:
65+
* `Factor` (type: `integer`): Interpreted as a multiplier for the recommended request.
66+
* Example: a value of `2` will double the limits.
67+
* `QuantityValue` (type: `string`): Adds an absolute value on top of the requests to determine the new limit.
68+
* Example: for memory, a value of `100Mi` means the new limit will be: calculated Memory request + `100Mi`.
69+
* `QuantityPercentage` (type: `integer`): Increases the limit by the specified percentage of the resource request.
70+
* Example: if the request is 1000m CPU and the percentage is 20, the limit will be 1000m + (20% of 1000m) = 1200m.
71+
72+
* `RequestToLimitRatio.CPU.Value` (type: `string`, required): Specifies the magnitude of the ratio between request and limit, interpreted according to `RequestToLimitRatio.CPU.Type`:
73+
* If `Type` is `Factor`: a value of `3` will triple the CPU limits.
74+
* If `Type` is `QuantityValue`: if the value is set to 200m, then the CPU limit will be set to the CPU request plus 200 millicores.
75+
* If `Type` is `QuantityPercentage`: a value of `20` increases the CPU limit by 20% of the calculated request.
76+
77+
* `RequestToLimitRatio.Memory.Value` (type: `string`): Similar to `CPU.Value`, except that for `QuantityValue` the units are memory-based (e.g., `Mi`, `Gi`) rather than CPU millicores (`m`).
78+
79+
### Behavior
80+
81+
VPA enforces the current request-to-limit ratio while respecting cluster-level constraints, such as a `LimitRange`, even if this requires lowering the resource request to fit within the maximum limit.
82+
83+
For example, suppose VPA calculates a new recommended CPU request of `200m`, the request-to-limit ratio is set to `1:4`, and a `LimitRange` enforces that a container cannot have more than `600m` CPU. In this case, VPA will set the CPU request to `150m` and the limit to `600m` in order to maintain the `1:4` ratio. This existing behavior is not affected by the new feature.
84+
85+
#### Current behaviour of VPA 1.4.2
86+
87+
1. The user sets the initial resource requests and limits at the workload API level, such as in a Kubernetes Deployment.
88+
2. When VPA applies new recommended resource request values, it maintains the initially set request-to-limit ratio.
89+
90+
For example, if the original resource request is `1` and the original limit is `2`, then after VPA calculates a new resource request of `10`, the new limit will be updated to `20`. In this version of VPA, the 1:2 ratio is preserved at all times.
91+
92+
If the user wants to modify the request-to-limit ratio, they must update the Deployment object directly. Since the `resources.requests` and `resources.limits` fields are immutable, this results in the termination and recreation of the existing Pods.
93+
94+
#### Proposed feature behavior
95+
96+
The values specified under `RequestToLimitRatio` in the VPA object will take precedence over the request-to-limit ratio initially set at the workload API level. For example, if the CPU ratio is initially set to `1:2` at the workload API level, but the VPA object sets the CPU request-to-limit ratio to `1:10` using the new `RequestToLimitRatio` field, VPA will use the ratio from the `RequestToLimitRatio` field (`1:10`) when applying new recommended values.
97+
98+
The behavior after implementing this feature is as follows:
99+
100+
1. The user defines a VPA object with the `controlledValues` field set to `RequestsAndLimits` and configures the request-to-limit ratio using the new `RequestToLimitRatio` sub-fields. Based on VPA's mode, the following occurs:
101+
* **Recreate mode**: When a new request-to-limit ratio is set, the ratio is applied only on Pod creation, after the Updater evicts the running Pod. In this mode, updating the request-to-limit ratio on a running Pod will affect the limits only after the Pod is evicted (either by the Updater or manually, e.g. via `kubectl delete pod`) when the current `resources.requests` differ significantly from the new recommendation.
102+
* **InPlaceOrRecreate mode** (alpha in v1.4.0): When a new request-to-limit ratio is set, the VPA Updater will attempt in-place updates using the `/resize` subresource to modify `Pod.Spec.Containers[i].Resources.limits` or `Pod.Spec.Containers[i].Resources.requests` in certain situations. If the in-place update fails, it falls back to evicting the Pod and performing a recreation. For more details, see the [In-Place Updates documentation](https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/docs/features.md#in-place-updates-inplaceorrecreate).
103+
* **Initial mode**: VPA updates the request-to-limit ratio only during Pod creation and does not change it later.
104+
2. If the `RequestToLimitRatio` feature gate is disabled, the request-to-limit ratio already set in the workload API (e.g. Deployment API) is used.
105+
106+
> [!IMPORTANT]
107+
> This new feature can be used together with other features in development, such as the [fixed memory-per-CPU ratio feature](https://github.com/kubernetes/autoscaler/pull/8459).
108+
109+
### Validation
110+
111+
* To use this functionality, the `RequestToLimitRatio` feature flag must be enabled.
112+
113+
#### Static Validation via CRD Rules
114+
115+
* The `RequestToLimitRatio` configuration will be validated when VPA CRD objects are created or updated. For example:
116+
* If `Type` is `Factor`, the value must be greater than or equal to 1 (enforced via CRD validation rules).
117+
* If `Type` is `QuantityPercentage`, the value must be greater than or equal to 1 (enforced via CRD validation rules).
118+
119+
#### Dynamic Validation via Admission Controller
120+
121+
* When using the new `RequestToLimitRatio` field, the `controlledValues` field must be set to `RequestsAndLimits`. It does not make sense to specify `RequestToLimitRatio` if VPA is not allowed to update limits. This requirement is enforced by the admission controller.
122+
* If `Type` is set to `QuantityValue`, then its `Value` will be validated.
123+
124+
### Feature Enablement and Rollback
125+
126+
#### Enabling or Disabling the Feature in a Live Cluster
127+
128+
* Enable the feature by setting the `RequestToLimitRatio` feature gate.
129+
* Components affected by this feature gate:
130+
* admission-controller
131+
* updater
132+
133+
#### When Enabled
134+
135+
* The admission controller will **accept** new VPA objects that include a configured `RequestToLimitRatio`.
136+
* For containers targeted by a VPA object using `RequestToLimitRatio`, the admission controller and/or the updater will enforce the configured ratio.
137+
138+
#### When Disabled
139+
140+
* The admission controller will **reject** new VPA objects that include a configured `RequestToLimitRatio`.
141+
* A descriptive error message should be returned to the user, indicating that the feature is feature-gated.
142+
* The admission controller and updater will behave as before, according to the behavior described [here](#current-behaviour-of-vpa-142).
143+
144+
### Kubernetes Version Compatibility
145+
146+
* Kubernetes version 1.33 or higher is required to use this feature with the VPA mode [`InPlaceOrRecreate`](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler/enhancements/4016-in-place-updates-support#kubernetes-version-compatibility).
147+
148+
149+
### Test Plan
150+
151+
* Implement comprehensive unit tests to cover all new functionality.
152+
* e2e tests: TODO
153+
154+
155+
### Examples
156+
157+
Here are some examples of VPA CRDs using the new `RequestToLimitRatio` field in different scenarios.
158+
159+
The following is a sample VPA manifest that targets a specific container named `app` in a Pod. In this manifest:
160+
* The CPU limit is set to twice the calculated CPU request.
161+
* The memory limit is set to the calculated memory request plus `200Mi`.
162+
163+
```yaml
164+
apiVersion: autoscaling.k8s.io/v1
165+
kind: VerticalPodAutoscaler
166+
metadata:
167+
name: my-app
168+
spec:
169+
targetRef:
170+
apiVersion: apps/v1
171+
kind: Deployment
172+
name: my-app
173+
updatePolicy:
174+
updateMode: InPlaceOrRecreate
175+
resourcePolicy:
176+
containerPolicies:
177+
- containerName: app
178+
controlledResources: ["cpu", "memory"]
179+
controlledValues: RequestsAndLimits
180+
RequestToLimitRatio:
181+
cpu:
182+
Type: Factor
183+
Value: 2
184+
memory:
185+
Type: QuantityValue
186+
Value: 200Mi
187+
```
188+
189+
In the manifest below, we configure VPA to control only the CPU resource's requests and limits for the container named `app`. The CPU limit is calculated by increasing the recommended CPU request by 30%.
190+
191+
```yaml
192+
apiVersion: autoscaling.k8s.io/v1
193+
kind: VerticalPodAutoscaler
194+
metadata:
195+
name: my-app
196+
spec:
197+
targetRef:
198+
apiVersion: apps/v1
199+
kind: Deployment
200+
name: my-app
201+
updatePolicy:
202+
updateMode: InPlaceOrRecreate
203+
resourcePolicy:
204+
containerPolicies:
205+
- containerName: app
206+
controlledResources: ["cpu"]
207+
controlledValues: RequestsAndLimits
208+
RequestToLimitRatio:
209+
cpu:
210+
Type: QuantityPercentage
211+
Value: 30
212+
```
213+
214+
## Implementation History
215+
216+
* 2025-09-10: Initial proposal created.

0 commit comments

Comments
 (0)