Skip to content

Conversation

@tommasopozzetti
Copy link

What this PR does / why we need it:

This PR adds flags to optionally customize CPU shares and reservations for cloned VMs as part of the vSphereMachine spec

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 26, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

Welcome @tommasopozzetti!

It looks like this is your first PR to kubernetes-sigs/cluster-api-provider-vsphere 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/cluster-api-provider-vsphere has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 26, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @tommasopozzetti. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 26, 2025
@smcallister-bc
Copy link

This seems incredibly useful, thank you for doing this! Any chance you could do something similar for memory reservations as well?

@tommasopozzetti
Copy link
Author

I’d be happy to add to the PR similar logic for memory reservation and shares as well as potentially memory pinning

I was hoping to first get a glance from a maintainer to see if this approach is reasonable given it’s my first contribution to this project!

Comment on lines 170 to 179
// CPUReservationMhz is the amount of CPU in MHz that is guaranteed available to the virtual machine.
// Defaults to the eponymous property value in the template from which the
// virtual machine is cloned.
// +optional
CPUReservationMhz int64 `json:"cpuReservationMhz,omitempty"`
// CPUShares are a relative priority to other virtual machines used in case of resource contention.
// Defaults to the eponymous property value in the template from which the
// virtual machine is cloned.
// +optional
CPUShares int32 `json:"cpuShares,omitempty"`
Copy link
Member

@chrischdi chrischdi Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have something like:

resources:
  reservation:
    cpu: ...
    memory: ...
  shares:
    ...

Or maybe this should be modelled in the k8s wordings which has limits and requests? (might not match the things this PR currently sets).

This is e.g. done by the vm-operator APIs. However, vm-operator does not use shares.

Could someone research what the benefits are of setting shares? And should we also consider CPU Limit to be set?

Copy link
Author

@tommasopozzetti tommasopozzetti Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chrischdi thanks for the review!

In terms of shares vs reservations, shares are a relative measure of prioritization while reservations are an absolute one. A VM with 2Ghz set for reservation will be guaranteed that even under host contention, it will always have 2Ghz of cpu power available to it. The vSphere admission controller will prevent a VM to power on if the total sum of reservations for VMs on a given host exceeds the total available cpu power of that host (no overprovisioning).
Shares on the other hand are just meaningful relative to shares of other VMs on that host. The host can be over provisioned and if it comes under cpu contention, VMs will be prioritized for cpu time relative to each other depending on their shares. So if a host has 3 VMs, one with 6000 shares, one with 3000 and one with 1000, if the host comes under cpu contention, the first will get 60% of cpu time, the second 30% and the third 10%. Each VM normally gets shares assigned by default proportional to its number of vCPUs but it is very useful to be able to tune that at will.

In terms of using cpu/memory limits, I have never had to implement these but they essentially would artificially cause the same effects as if the underlying host was under resource contention even if it is not, when the VM reaches said limit. More detailed info here. I'd be happy to add to this PR the limit as well as an optional configurable if desired.

Finally, in terms of the syntax, I'm open to suggestions! I personally feel like using the same syntax as standard Kubernetes containers might be misleading since the practical implementation of using reservations, shares and limits on VMs is very different than memory and cpu requests and limits for k8s pods.
I was going for a more flat mapping similar to the other properties that matches with the VM options and would look like

cpuReservationMhz: xxx
cpuShares: xxx
cpuLimitMhz: xxx
memoryReservationMB: xxx
memoryShares: xxx
memoryLimitMB: xxx
reserveAllMemory: false

but, if preferred, we could also go for something nested like

resourceManagement:
  cpu:
    reservationMhz:
    shares:
    limitMhz:
  memory:
    reservationMB:
    shares:
    limitMB:
    reserveAll:

or similar

Copy link
Member

@chrischdi chrischdi Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to takeover the definition how it is in vm-operator to have a similar API.

Which comes down to:

resources:
  requests: # --> reservations
    cpu: ... # in mhz, documented on the godoc
    memory: ...
  limits: # --> limits
    cpu: ...
    memory: ...

And also use the types which are common for kubernetes.

https://github.com/vmware-tanzu/vm-operator/blob/main/api/v1alpha5/virtualmachineclass_types.go#L82-L86

The fields should have proper description to what this fields map at the end.

For shares: see my other comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: having the unit in the name does not make sense to me. At least not for memory, and I think also not for CPU.
If we have a proper godoc that explains that this field is in hz, 2GhZ should map to e.g. "2Gi".

@tommasopozzetti
Copy link
Author

Hi @chrischdi, following up on your comment, do you have any thoughts on my reply?
Thanks!

@tommasopozzetti
Copy link
Author

@chrischdi or @sbueringer do you have any opinions on the above discussion? I'd love to do any edits that make sense to push this forward! This feature is one that we really need
Thank you!

Copy link
Member

@chrischdi chrischdi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all: sorry for the long run on this and thanks for reminding me.

I'm fine with adding api and implementation which properly sets reservation (= requests in kubernetes) and limits for CPU and memory.

I'd like to again think about the use-case of setting shares for a VM. I want to understand the use-case so we don't add an api which won't really get used at the later stage.

VM-Operator does not have them configurable. And I think there's a reason.

If I get it right, setting shares only take effect if there is a lack of resources affecting all the VMs of the vSphere host / cluster / resourcepool (depending on setup).
So I think there's no way, or it is pretty hard, to set shares to a value that makes sense across the whole infrastructure.
The real solution here should be adding more physical capacity.

Shares can be used to prioritize resource availability for VMs at the time of contention.

[0]

@tommasopozzetti : I'd like to better understand the use-case you have for setting shares?

Maybe @akutz has some thoughts here.

@vr4manta maybe you have some thoughts here too?

Comment on lines 170 to 179
// CPUReservationMhz is the amount of CPU in MHz that is guaranteed available to the virtual machine.
// Defaults to the eponymous property value in the template from which the
// virtual machine is cloned.
// +optional
CPUReservationMhz int64 `json:"cpuReservationMhz,omitempty"`
// CPUShares are a relative priority to other virtual machines used in case of resource contention.
// Defaults to the eponymous property value in the template from which the
// virtual machine is cloned.
// +optional
CPUShares int32 `json:"cpuShares,omitempty"`
Copy link
Member

@chrischdi chrischdi Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to takeover the definition how it is in vm-operator to have a similar API.

Which comes down to:

resources:
  requests: # --> reservations
    cpu: ... # in mhz, documented on the godoc
    memory: ...
  limits: # --> limits
    cpu: ...
    memory: ...

And also use the types which are common for kubernetes.

https://github.com/vmware-tanzu/vm-operator/blob/main/api/v1alpha5/virtualmachineclass_types.go#L82-L86

The fields should have proper description to what this fields map at the end.

For shares: see my other comment.

@tommasopozzetti
Copy link
Author

@chrischdi thank you for your review!

I will make the changes to the PR to implement the syntax you suggested to match vm-operator.
I would like to include shares as well in the design though. I'm personally not familiar with vm-operator but I can share some of the thoughts behind the use.

First and foremost, shares are in use all the time out of the box even if you do not set them. vSphere will assign automatically shares to every VM proportional to the chosen priority of the VM (or the default one for the resource pool if not defined) and the number of cores of the VM. So shares will be used regardless of our implementation here. My proposed addition just gives more optional control over them allowing to map the corresponding available api to configure a custom value of shares if desired.

We use this heavily. Shares are the only way to properly set relative priorities among VMs to distinguish among more and less critical workloads and in our case, we have more "classes" of workloads then the simple default 3 levels of priorities available so the ability to customize them is imperative.

We have actually been recommended to use this mechanism from VMware engineers so I assume these are used and valuable.

While I definitely agree that as you run out of capacity, adding capacity is always the best solution, it is not one always readily available and, when running over provisioned (which we are significantly to make efficient use of resources), spikes will always at times cause temporary contention. That contention must somehow be resolved and the way vSphere does that is giving relative cpu time based on the shares. Only using reservations and limits does not address this properly. It just sets a guaranteed minimum and a ceiling maximum for the resource but it does not give any direction to the system of how to prioritize in between that min and max among VMs.
A concrete example: if I have 10Ghz available, and 2 VMs both requesting 4Ghz and with a limit of 8Ghz and both spiking in a moment of high load to try and use 6Ghz each, how does the system choose the allocation of the available 10? What if one of those VMs is a dev workload and one a qa one? How do I explain to the system to prioritize qa? I can give twice as many shares to qa and in that scenario, it will get twice as much of the remaining capacity than the dev one

I can start working on the edits to have the following syntax and ensure types and comments match the vm-operator with the shares addition, but please do let me know in the meantime if you have further thoughts about the shares discussion!

resources:
  requests: # --> reservations
    cpu: ... # in mhz, documented on the godoc
    memory: ...
  limits: # --> limits
    cpu: ...
    memory: ...
  shares:
    cpu: ...
    memory: ...

@scott-grimes
Copy link

👍 would love this - both the reservations/limits and also shares

@chrischdi
Copy link
Member

Sounds reasonable, thanks for explaining!

Also the example Looks good to me 👍

Let me know once you updated the PR

@tommasopozzetti tommasopozzetti force-pushed the feature/cpu-custom-allocation branch from 3d5a4db to 47c75b7 Compare October 21, 2025 17:50
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Oct 21, 2025
@tommasopozzetti tommasopozzetti changed the title ✨Add flags to allow customization of CPU shares and reservations ✨Add flags to allow customization of CPU and memory shares, reservations and limits Oct 21, 2025
@tommasopozzetti
Copy link
Author

@chrischdi I have updated the PR to follow the proposed syntax!
Let me know your thoughts!
Thank you

@sbueringer
Copy link
Member

I'll also try to take a look soon

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 22, 2025
@tommasopozzetti tommasopozzetti force-pushed the feature/cpu-custom-allocation branch 2 times, most recently from ab91902 to 64df85a Compare October 22, 2025 14:58
@tommasopozzetti
Copy link
Author

tommasopozzetti commented Oct 22, 2025

Thanks @sbueringer!
I fixed all the linting issues. There is still one check failing but I'm not sure that it is related to any change here. Any advice would be great!

// Set CPU reservations, limits and shares if specified
cpuAllocation := types.ResourceAllocationInfo{}
if !vmCtx.VSphereVM.Spec.Resources.Requests.Cpu.IsZero() {
cpuReservationMhz := int64(math.Ceil(float64(vmCtx.VSphereVM.Spec.Resources.Requests.Cpu.Value()) / float64(1000000)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add helper funcs for convertQuantityToMhz and convertQuantityToMB

}
cpuAllocation.Shares = ptr.To(cpuShares)
}
spec.Config.CpuAllocation = ptr.To(cpuAllocation)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any effects if this is set empty vs nil (which it was before?)

I'd prefer to not change the existing behavior.

So if no requests, limits or shares are set, we should not set CPUAllocation to an empty struct. Same below.

@chrischdi
Copy link
Member

conversion also need to be fixed to be round-tripable.

See e.g. what we do for AdditionalDisksGiB: https://github.com/openshift/cluster-api-provider-vsphere/blob/393a16983d02d5a2b254e4182002f70329f9dd8f/apis/v1alpha4/vspheremachine_conversion.go#L39

@tommasopozzetti tommasopozzetti force-pushed the feature/cpu-custom-allocation branch 3 times, most recently from 4c59fe7 to 6933d96 Compare October 24, 2025 18:38
@tommasopozzetti
Copy link
Author

@chrischdi thanks for he input!
I think I addressed all the concerns.

The only comment I have is on the quantities and the validation. Rather then hard failing if the quantity is not set as a precise multiple of Mhz or MiB, which would be an atypical behavior for resources specification on a k8s resource, I propose (and implemented) to automatically round up to the nearest Mhz or MiB. The behavior has also been documented in the field descriptions.

There look to still be a check failing and it is unclear to me if it's still related to the conversion webhooks or unrelated. The logs don't make it very easy to understand what those checks are doing and why they are failing. I see several errors related to sessions to vCenter and parsing of the APIVersion and Kind, both things that don't have anything to do with the contents of this PR. Any further guidance here is definitely appreciated!
Thank you!

@chrischdi
Copy link
Member

The only comment I have is on the quantities and the validation. Rather then hard failing if the quantity is not set as a precise multiple of Mhz or MiB, which would be an atypical behavior for resources specification on a k8s resource, I propose (and implemented) to automatically round up to the nearest Mhz or MiB. The behavior has also been documented in the field descriptions.

That's fine for me.

The failing tests are fuzz tests. They try to ensure that the conversions are round-tripable. So the failing tests here are:

v1beta1.VSphereVM -> v1alpha(3/4).VSphereVM -> v1beta1.VSphereVM

And that causes a diff.

Same for VSphereMachineTemplate.

We also need to add the conversion change for VSphereVM and VSphereMachineTemplate, because they also get the additional fields.

@tommasopozzetti tommasopozzetti force-pushed the feature/cpu-custom-allocation branch from 6933d96 to 89944f7 Compare October 28, 2025 16:12
@tommasopozzetti
Copy link
Author

@chrischdi thanks for the pointers!
Looks like those are solved now!
One more test is failing but seems like it's failing on setting up the test environment?
Appreciate any pointers there as well, and thanks for your continued guidance!

@tommasopozzetti
Copy link
Author

@chrischdi circling back here to see if there is anything else I can do!
Thank you!

@chrischdi
Copy link
Member

/retest

@chrischdi chrischdi changed the title ✨Add flags to allow customization of CPU and memory shares, reservations and limits ✨ Add flags to allow customization of CPU and memory shares, reservations and limits Nov 5, 2025
@tommasopozzetti
Copy link
Author

@chrischdi yay! Looks like all tests passed!

Would it be possible to get a final review and potentially get this in to be included in the next release? We are very much looking forward to using this feature!
Thank you!

Copy link
Member

@sbueringer sbueringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, just one nit from my side

@tommasopozzetti tommasopozzetti force-pushed the feature/cpu-custom-allocation branch from 89944f7 to 8d3bbd4 Compare November 20, 2025 19:45
@tommasopozzetti
Copy link
Author

/retest

@tommasopozzetti
Copy link
Author

@sbueringer Thank you! I corrected the CPU field as requested!

@sbueringer
Copy link
Member

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main

@kubernetes-sigs kubernetes-sigs deleted a comment from k8s-ci-robot Nov 21, 2025
@sbueringer
Copy link
Member

sbueringer commented Nov 21, 2025

@sbueringer Thank you! I corrected the CPU field as requested!

Thank you very much!

Would it be possible to get a final review and potentially get this in to be included in the next release? We are very much looking forward to using this feature!
Thank you!

Yup, let's get this merged early next week, so it's part of the next release.
Thx for the patience, sorry for the long delays for review, was a way-too-busy release cycle in core CAPI :)

/assign @chrischdi

for a final review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants