Skip to content

Conversation

@zhanggbj
Copy link
Contributor

@zhanggbj zhanggbj commented Oct 28, 2025

What this PR does / why we need it:
Support Node Auto Placement and Node AF/AAF

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

zhanggbj and others added 11 commits October 22, 2025 14:19
- Bump VMOP including Node AF/AAF support
- Add NodeAutoPlacement Feature Gate

(cherry picked from commit 700c8ae)
Removes the extra cases for VMG creation, such that VMG is created for:
1. Multiple zones, multiple MDs with no failureDomain
2. Multiple zones, multiple MDs with failureDomain
3. Single zone, existing cluster with no failureDomain MDs

Signed-off-by: Sagar Muchhal <[email protected]>
- Updates VMOP API dependency

Misc VMG fixes
- Use namingStrategy to calculate VM names
- Use MachineDeployment names for VMG placement label
- Includes all machinedeployments to generate node-pool -> zone
  mapping

Fixes VMG webhook validation error
- Adds cluster-name label to Af/AAF spec
- re-adds zone topology key back to anti-aff spec

Signed-off-by: Sagar Muchhal <[email protected]>
…gs#71)

* Refine VMG controller when generate per-MD zone labels

- Skip legacy already-placed VM which do not have placement info
- Skip VM which do not have zone info

* Apply suggestions from code review

---------

Co-authored-by: Sagar Muchhal <[email protected]>
- Sync VSphereMachines during day-2 operations in VMG controller
- Only wait for all intended VSphereMachines during initial Cluster creation
- Use annotations in VMG for per-md-zone info

Signed-off-by: Gong Zhang <[email protected]>
- Add VMG recociler unit test
- Bump VMOP due to API change
- Filter out VSphereMachine event except create/delete events

Signed-off-by: Gong Zhang <[email protected]>
@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 28, 2025
@k8s-ci-robot k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Oct 28, 2025
@zhanggbj zhanggbj changed the title [WIP][DoNotReview]Support Node Auto Placement and Node AF/AAF [WIP][DoNotReview]✨ Support Node Auto Placement and Node AF/AAF Oct 28, 2025
@zhanggbj zhanggbj force-pushed the node_auto_placement branch 3 times, most recently from 1160ce9 to 1cd61f9 Compare October 28, 2025 10:31
@zhanggbj
Copy link
Contributor Author

/test ?

@k8s-ci-robot
Copy link
Contributor

@zhanggbj: The following commands are available to trigger required jobs:

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
/test pull-cluster-api-provider-vsphere-test-main
/test pull-cluster-api-provider-vsphere-verify-main

The following commands are available to trigger optional jobs:

/test pull-cluster-api-provider-vsphere-apidiff-main
/test pull-cluster-api-provider-vsphere-janitor-main

Use /test all to run the following jobs that were automatically triggered:

pull-cluster-api-provider-vsphere-apidiff-main
pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
pull-cluster-api-provider-vsphere-test-main
pull-cluster-api-provider-vsphere-verify-main

In response to this:

/test ?

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@zhanggbj
Copy link
Contributor Author

/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-govmomi-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-blocking-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-ci-latest-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-conformance-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-main
/test pull-cluster-api-provider-vsphere-e2e-supervisor-upgrade-1-34-1-35-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-govmomi-main
/test pull-cluster-api-provider-vsphere-e2e-vcsim-supervisor-main
/test pull-cluster-api-provider-vsphere-test-main
/test pull-cluster-api-provider-vsphere-verify-main
/test pull-cluster-api-provider-vsphere-apidiff-main
/test pull-cluster-api-provider-vsphere-janitor-main

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 7 times, most recently from 22604bf to 7c7c1a7 Compare November 3, 2025 08:55
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from fabriziopandini. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 8 times, most recently from afd77d2 to 233cf64 Compare November 12, 2025 16:13
- Update to watch Cluster as primary
- Decouple functions
- Update to accurate match when checking if VM is a member of VMG
- Update UT
- Refine godoc
- Miscellaneous

Signed-off-by: Gong Zhang <[email protected]>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from 233cf64 to e3078c9 Compare November 13, 2025 03:35
Copy link
Contributor

@DanielXiao DanielXiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question which may be picky, you don't have to solve it.

- apiGroups:
- vmoperator.vmware.com
resources:
- virtualmachinegroups/status
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just add all verbs on the marker for virtualmachineservices/status so it will be grouped in the RBAC rule above

(not sure how much sense this permission even makes, nobody does a get status :), but let's align to what we do for the other vmop resources)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated virtualmachineservices/status with all markers so it could be grouped.

Question: Seems CAPV has permission it doesn't truly use. Should we keep the permission minimal just as what controller used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to open a follow-up issue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed an issue for tracking: #3681

return err
}

if feature.Gates.Enabled(feature.NamespaceScopedZones) && feature.Gates.Enabled(feature.NodeAutoPlacement) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with not adding additional validation here

(I think the validation here would mean we would need an additional validating webhook on the Cluster object)

Named("virtualmachinegroup").
Watches(
&vmoprv1.VirtualMachineGroup{},
handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}),
handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}),
ctrlbldr.WithPredicates(predicates.ResourceIsChanged(mgr.GetScheme(), predicateLog)),

We have not been using this in CAPV up until now, but let's use it here to already improve this new reconciler (I have a task to follow-up for others soon'ish).

Per default the reconciler will get resyncs from all informers. i.e. every 10m everything gets reconciled because of the Cluster resync and additionally every 10m everything gets reconciled because of the VirtualMachineGroup resync.

In core CAPI we drop events from resyncs of secondary watches to avoid this issue

(it's not needed for VSphereMachine below as we already drop all updates)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This help to reduce the resync noices.

Comment on lines 96 to 101
vmg := &vmoprv1.VirtualMachineGroup{}
err := r.Client.Get(ctx, apitypes.NamespacedName{Namespace: vSphereMachine.Namespace, Name: clusterName}, vmg)

if err != nil {
return nil
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should not have this kind of logic in a mapper.

Let's deal with this only in the controller.

It's much more obvious there and the cache here can be out of sync.

Let's then use vSphereMachine.Namespace and clusterName for the request below

(please also update the godoc of this func)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make senses. Fixed it and also added predict in watch for VSphereMachine to only handle VSphereMachines which have MachineDeployment label.

// reconciliationDelay is the delay time for requeueAfter.
reconciliationDelay = 10 * time.Second
// ZoneAnnotationPrefix is the prefix used for placement decision annotations which will be set on VirtualMachineGroup.
ZoneAnnotationPrefix = "zone.cluster.x-k8s.io"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to namespace this to CAPV as it's not a Cluster API wide standard annotation

(TBD with @fabriziopandini what exactly we should use instead)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed, let's use zone.vmware.infrastructure.cluster.x-k8s.io

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review only.

A few themes are emerging and it will be nice if they are addressed across the changeset:

  • we should add godoc keeping track of why we are doing things in a specific way, because this will help future us when doing maintenance
  • we should use consistently machineDeployment instead of nodePool, failureDomain instead of zone (I know there is tech debt, but we don't have to add on top, possibly improve)
  • logs should align to guidelines

vmOperatorVM = typedModified
}

if affinityInfo != nil && affinityInfo.affinitySpec != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to document why we are doing stuff, because this is the background that will be most valuable for future maintenance, eg.

Suggested change
if affinityInfo != nil && affinityInfo.affinitySpec != nil {
// Set Affinity and Group info on a VM only on creation,
// because, changing those info after placement won't have any effect.
if affinityInfo != nil && affinityInfo.affinitySpec != nil {

I would also probably revert the if, to check "if not set" (set on creation) first

Copy link
Contributor Author

@zhanggbj zhanggbj Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments for clarification that affinity only impact VM locations during day-1.

I would also probably revert the if, to check "if not set" (set on creation) first

Could you help clarify this one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to improve your comment to prevent confusion between the comment taking about when the anti affinity rules have effect and the code always setting those rules

type affinityInfo struct {
affinitySpec *vmoprv1.AffinitySpec
vmGroupName string
failureDomain *string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we should we need a pointer here? (If i'm not wrong there is no difference between this being nil or empty)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using failureDomain string works here.
CAPV is using failureDomain * stringhttps://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/main/apis/v1beta1/vspheremachine_types.go#L117 but CAPI is using failureDomain string, how about we keep it consistent in CAPV? but I don't have strong opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use string here, it is a private struct and there are no direct relations with the API types (also API types cannot be changed, it is a breaking change)


// affinityInfo is an internal to store VM affinity information.
type affinityInfo struct {
affinitySpec *vmoprv1.AffinitySpec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we need a pointer here? looking at usage, whenever affinityInfo is set, also affinitySpec is set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, whenever we set affinityInfo, we'll always set affinitySpec. Seems only checking affinityInfo != nil is fine.

However, since AffinitySpec contains lots of rules and arrays of MachineDeployments. Is it better to keep it *vmoprv1.AffinitySpec for less Mem usage?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer readability to micro optimization (the struct is used only locally)


// Calculate expected Machines of all MachineDeployments.
// CAPV retrieves placement decisions from the VirtualMachineGroup to guide
// day-2 VM placement. At least one VM is required for each MachineDeployment.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least one VM is required for each MachineDeployment.

I'm wondering if this assumption holds when using autoscaler from zero... 🤔

cc @sbueringer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this check only happens when VMG is not created yet, so for day-2 scale from zero, it won't go into this case.
In addition for day-1, we'll block create Cluster with MD.replica == 0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For day-2, since we'll get vmg successfully in line , so we won't run into this line.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it means that a user can't create a cluster with automatic node placement and a MachineDeployment configured with autoscaler to zero enabled?
If this is the case, please add a comment

// Wait for all intended VSphereMachines corresponding to MachineDeployment to exist only during initial Cluster creation.
current := int32(len(current))
if current < expected {
log.Info("current VSphereMachines do not match expected", "Expected:", expected,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this message start with Waiting for...

@zhanggbj
Copy link
Contributor Author

@sbueringer @fabriziopandini
I've addressed above comments, please help to take another review, thanks!

BTW, I'm still improving logging and godoc incrementally, feel free to point out if anything missing.

Name: cluster.Name,
}

if err := r.Client.Get(ctx, *key, desiredVMG); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if err := r.Client.Get(ctx, *key, desiredVMG); err != nil {
// If the VM group does not exist yet
if err := r.Client.Get(ctx, *key, desiredVMG); err != nil {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments.

}
mdNames := []string{}
for _, machineDeployment := range machineDeployments.Items {
if machineDeployment.Spec.Template.Spec.FailureDomain == "" && machineDeployment.Name != mdName {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it correct to not consider machine deployments without FailureDomain?

As a user, I would like that anti affinity considers all the MachineDeployments, not only a subset (only exception should be the MachineDeployment the machine belongs to)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because with current VMOP design, MachineDeployment with failureDomain specified will skip placement process. So I think this is fine.

client.MatchingLabels{clusterv1.ClusterNameLabel: supervisorMachineCtx.Cluster.Name}); err != nil {
return false, err
}
mdNames := []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be

Suggested change
mdNames := []string{}
otherMDNames := []string{}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Comment on lines 234 to 235
// Set the zone label using the annotation of the per-md zone mapping from VMG.
// This is for new VMs created during day-2 operations in VC 9.1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed

return true, nil
}

affInfo = affinityInfo{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed

}

// If ControlPlane haven't initialized, requeue it since CAPV will only start to reconcile VSphereMachines of
// MachineDeployment until ControlPlane is initialized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be

Suggested change
// MachineDeployment until ControlPlane is initialized.
// MachineDeployment after ControlPlane is initialized.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


// Wait for all intended VSphereMachines corresponding to MachineDeployment to exist only during initial Cluster creation.
current := int32(len(current))
if current < expected {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed

// Set the BootOrder spec as the
desiredVMG.Spec.BootOrder = []vmoprv1.VirtualMachineGroupBootOrderGroup{
{
Members: members,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed


// Check if this VM belongs to any of our target MachineDeployments.
// Annotation format is "zone.cluster.x-k8s.io/{machine-deployment-name}".
for _, md := range machineDeployments {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed


// GenerateVirtualMachineName generates the name of a VirtualMachine based on the naming strategy.
// Duplicated this logic from pkg/services/vmoperator/vmopmachine.go.
func GenerateVirtualMachineName(machineName string, namingStrategy *vmwarev1.VirtualMachineNamingStrategy) (string, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems not yet addressed

@zhanggbj zhanggbj force-pushed the node_auto_placement branch 4 times, most recently from 3b5ec17 to ea432ed Compare November 21, 2025 09:55
- Refine VMG controller watch
- Handle race conditions in VMG controller by gating member update
- Refine data struct for VM Affinity config
- Refine UT
- Refine naming, logging, godoc
- Miscellaneous

Signed-off-by: Gong Zhang <[email protected]>
@zhanggbj zhanggbj force-pushed the node_auto_placement branch from ea432ed to dc441f6 Compare November 21, 2025 13:29
@k8s-ci-robot
Copy link
Contributor

@zhanggbj: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cluster-api-provider-vsphere-test-main dc441f6 link true /test pull-cluster-api-provider-vsphere-test-main

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants