AEP-7571: Pod-level resources support in VPA #8586

iamzili · 2025-09-29T18:39:29Z

What type of PR is this?

/kind documentation
/kind feature
/area vertical-pod-autoscaler

What this PR does / why we need it:

Autoscaling Enhancement Proposal (AEP) for pod-level resources support in VPA.

Related ticket from which this AEP originated: Issue

More details about pod-level resources can be found here:

I'd love to hear your thoughts on this feature.

k8s-ci-robot · 2025-09-29T18:39:32Z

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot · 2025-09-29T18:39:38Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: iamzili
Once this PR has been reviewed and has the lgtm label, please assign adrianmoisey for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

vertical-pod-autoscaler/enhancements/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-09-29T18:39:39Z

Hi @iamzili. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

adrianmoisey · 2025-09-29T18:56:24Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+## Summary
+
+Starting with Kubernetes version 1.34, it is now possible to specify CPU and memory `resources` for Pods at the pod level in addition to the existing container-level `resources` specifications. For example:


It may be worth linking the KEP here

I'm linking the KEP and the official blog post a little further down: here

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

adrianmoisey · 2025-10-06T17:23:59Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+This section describes how VPA reacts based on where resources are defined (pod level, container level or both).
+
+Before this KEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.


Just want to sanity check this a little.

typically by aggregating container recommendations

From what I can tell, the metric that metric-server provides is per-container.

So the idea is to leave the recommender as is, making per-container recommendations based on its per-container metric, and let the updater/admission-controller user an aggregated value for the Pod resources.

Is my understanding here right?

adrianmoisey · 2025-10-06T17:25:53Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+- Extend the VPA object:
+  1. Add a new `spec.resourcePolicy.podPolicies` stanza. This stanza is user-modifiable and allows setting constraints for pod-level recommendations:
+     - `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
+     - `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
+     - `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
+     - `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").
+  2. Add a new `status.recommendation.podRecommendation` stanza. This field is not user-modifiable, it is populated by the VPA recommender and stores the Pod-level recommendations. The updater and admission controller use this stanza to read Pod-level recommendations. The updater may evict Pods to apply the recommendation, the admission controller applies the recommendation when the Pod is recreated.


Would it be possible to have an example Go Type here?

adrianmoisey · 2025-10-06T17:26:49Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+## Proposal
+
+- Add a new feature flag named `PodLevelResources`. Because this proposal introduces new code paths across all three VPA components, this flag will be added to each component.


Is this a feature flag to assist with GAing the feature, or is it a flag to enable/disable the feature?

My intention is to use the flag to enable or disable the feature. In other words, the feature should be disabled by default at first, and once the feature matures, it can be enabled by default starting from a specific VPA version.

Could you please clarify what you mean by using the flag for GAing the feature?

jackfrancis · 2025-10-06T19:35:52Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+For workloads that define only pod-level resources, VPA will control resources at the pod level. At the time of writing, [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not available for pod-level fields, so applying pod-level recommendations requires evicting Pods. 
+
+When [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) becomes available, VPA should attempt to apply pod-level recommendations in place first and fall back to eviction if in-place updates fail, mirroring the current `InPlaceOrRecreate` behavior used for container-level updates.


Because this AEP has a dependency on the functionality described in https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize, can we restate the language as if https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize is already implemented, and then add a note that we won't approve this AEP until post-1.35 (when in-place resizing of pod-level resources has been implemented)?

I thought that this AEP isn't dependant on that feature, it's calling out that we can't do in-place resizing until that KEP is ready

Let's remove this section, there is no connection between the current AEP and the in-place feature.
This AEP should focus on pod level resources only.

omerap12

Really thanks for the hard work here Erik!
To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here.
I left couple of notes throughout the proposal.
Can we please remove the in-place feature from this AEP?
This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

omerap12 · 2025-10-07T09:09:00Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+**Cons**:
+- Applying both pod-level and container-level recommendations requires eviction because [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not yet available.
+- This option adds complexity: VPA must track which container-level resources are under its control by default and avoid mutating others.


VPA should track all containers ( like it currently does by default ), so what is the added complexity ( we are just checking for each container if "resources" are specified - seems pretty simple to me but maybe I'm wrong here )?

Yeah, it should be simple to implement. I used the wrong wording, so I removed the part that suggested it adds extra complexity.

At the same time, I still think we should keep this part under the "Cons" section, because even though it's simple to implement, it still adds extra logic to the VPA.

Fair enough.

omerap12 · 2025-10-07T09:14:58Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+     - `controlledResources`: Specifies which resource types are recommended (and possibly applied). Valid values are `cpu`, `memory`, or both. If not specified, both resource types are controlled by VPA.
+     - `controlledValues`: Specifies which resource values are controlled. Valid values are `RequestsAndLimits` and `RequestsOnly`. The default is `RequestsAndLimits`.
+     - `minAllowed`: Specifies the minimum resources that will be recommended for the Pod. The default is no minimum.
+     - `maxAllowed`: Specifies the maximum resources that will be recommended for the Pod. The default is no maximum. To ensure per-container recommendations do not exceed the Pod's defined maximum, apply the formula to adjust the recommendations for containers proposed by @omerap12 (see [discussion](https://github.com/kubernetes/autoscaler/issues/7147#issuecomment-2515296024)). This field takes precedence over the global Pod maximum set by the new flags (see "Global Pod maximums").


Thanks for catching that! (I forgot I wrote that TBH ) :)

My formula should be correct, but what happens if after the normalization of the container[i] resources we get a value which is little/bigger than the minAllowed/maxAllowed?
I thought we can do something like that:

- If adjusted[i] < container.minAllowed[i]: set to minAllowed[i] - If adjusted[i] > container.maxAllowed[i]: set to maxAllowed[i]

And then we need to re-check pod limits after container policy adjustments ( since it might be bigger ).
If we are still exceeding pod limits - what we wanna do here?
cc @adrianmoisey

Sorry if I wasn't clear enough :)

omerap12 · 2025-10-07T09:34:38Z

vertical-pod-autoscaler/enhancements/7571-support-pod-level-resources/README.md

+
+### Test Plan
+
+TODO


In order for this AEP to merged this has to be filled ( I know it's a WIP but just a remainder ) :)

Co-authored-by: Adrian Moisey <[email protected]>

iamzili · 2025-10-07T18:52:53Z

Really thanks for the hard work here Erik! To my opinion I think we should choose option 2 as the default (control both pod-level and initially set container-level resources) here. I left couple of notes throughout the proposal. Can we please remove the in-place feature from this AEP? This AEP should focus only on pod-level resource so cons like "Applying both pod-level and container-level recommendations requires eviction because is not yet available" are redundant.

I would also prefer option 2 (control both pod-level and initially set container-level resources). BTW when do you think a decision will be made to go with this option? We will need to update the AEP to reflect the chosen approach. Once the decision is final, I also plan to add more details.

Furthermore, why are you suggesting that the pod-level resources in-place resize related parts should be removed from this AEP? Since this AEP focuses on the pod-level resources stanza, how it can be mutated (or not) seems relevant from the VPA's perspective

add support for pod-level resources in VPA

b1e87de

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 29, 2025

k8s-ci-robot requested review from omerap12 and voelzmo September 29, 2025 18:39

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 29, 2025

iamzili mentioned this pull request Sep 29, 2025

VPA support for pod-level resource specifications #7571

Open

fix clarity

7726b83

adrianmoisey reviewed Oct 6, 2025

View reviewed changes

jackfrancis reviewed Oct 6, 2025

View reviewed changes

omerap12 reviewed Oct 7, 2025

View reviewed changes

fix term

3690bb9

Co-authored-by: Adrian Moisey <[email protected]>


		## Summary

		Starting with Kubernetes version 1.34, it is now possible to specify CPU and memory `resources` for Pods at the pod level in addition to the existing container-level `resources` specifications. For example:


		This section describes how VPA reacts based on where resources are defined (pod level, container level or both).

		Before this KEP, the recommender computes recommendations only at the container level, and VPA applies changes only to container-level fields. With this proposal, the recommender also computes pod-level recommendations in addition to container-level ones. Pod-level recommendations are derived from per-container usage and recommendations, typically by aggregating container recommendations. Container-level policy still influences pod-level output: setting `mode: Off` in `spec.resourcePolicy.containerPolicies` excludes a container from recommendations, and `minAllowed`/`maxAllowed` bounds continue to apply.


		## Proposal

		- Add a new feature flag named `PodLevelResources`. Because this proposal introduces new code paths across all three VPA components, this flag will be added to each component.


		For workloads that define only pod-level resources, VPA will control resources at the pod level. At the time of writing, [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) is not available for pod-level fields, so applying pod-level recommendations requires evicting Pods.

		When [in-place pod-level resource resizing](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/5419-pod-level-resources-in-place-resize) becomes available, VPA should attempt to apply pod-level recommendations in place first and fall back to eviction if in-place updates fail, mirroring the current `InPlaceOrRecreate` behavior used for container-level updates.


		### Test Plan

		TODO

AEP-7571: Pod-level resources support in VPA #8586

Are you sure you want to change the base?

AEP-7571: Pod-level resources support in VPA #8586

Conversation

iamzili commented Sep 29, 2025

What type of PR is this?

What this PR does / why we need it:

Uh oh!

k8s-ci-robot commented Sep 29, 2025

Uh oh!

k8s-ci-robot commented Sep 29, 2025

Uh oh!

k8s-ci-robot commented Sep 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

omerap12 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

iamzili commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

iamzili Oct 7, 2025 •

edited

Loading

iamzili commented Oct 7, 2025 •

edited

Loading