-
Notifications
You must be signed in to change notification settings - Fork 306
✨ Support Node Auto Placement and Node AF/AAF #3655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Bump VMOP including Node AF/AAF support - Add NodeAutoPlacement Feature Gate (cherry picked from commit 700c8ae)
(cherry picked from commit cfeb862)
Removes the extra cases for VMG creation, such that VMG is created for: 1. Multiple zones, multiple MDs with no failureDomain 2. Multiple zones, multiple MDs with failureDomain 3. Single zone, existing cluster with no failureDomain MDs Signed-off-by: Sagar Muchhal <[email protected]>
- Updates VMOP API dependency Misc VMG fixes - Use namingStrategy to calculate VM names - Use MachineDeployment names for VMG placement label - Includes all machinedeployments to generate node-pool -> zone mapping Fixes VMG webhook validation error - Adds cluster-name label to Af/AAF spec - re-adds zone topology key back to anti-aff spec Signed-off-by: Sagar Muchhal <[email protected]>
Signed-off-by: Sagar Muchhal <[email protected]>
Signed-off-by: Sagar Muchhal <[email protected]>
…gs#71) * Refine VMG controller when generate per-MD zone labels - Skip legacy already-placed VM which do not have placement info - Skip VM which do not have zone info * Apply suggestions from code review --------- Co-authored-by: Sagar Muchhal <[email protected]>
- Sync VSphereMachines during day-2 operations in VMG controller - Only wait for all intended VSphereMachines during initial Cluster creation - Use annotations in VMG for per-md-zone info Signed-off-by: Gong Zhang <[email protected]>
- Add VMG recociler unit test - Bump VMOP due to API change - Filter out VSphereMachine event except create/delete events Signed-off-by: Gong Zhang <[email protected]>
Signed-off-by: Gong Zhang <[email protected]>
Signed-off-by: Gong Zhang <[email protected]>
1160ce9 to
1cd61f9
Compare
|
/test ? |
|
@zhanggbj: The following commands are available to trigger required jobs: The following commands are available to trigger optional jobs: Use In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/test pull-cluster-api-provider-vsphere-e2e-govmomi-blocking-main |
22604bf to
7c7c1a7
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
afd77d2 to
233cf64
Compare
- Update to watch Cluster as primary - Decouple functions - Update to accurate match when checking if VM is a member of VMG - Update UT - Refine godoc - Miscellaneous Signed-off-by: Gong Zhang <[email protected]>
233cf64 to
e3078c9
Compare
DanielXiao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A question which may be picky, you don't have to solve it.
config/rbac/role.yaml
Outdated
| - apiGroups: | ||
| - vmoperator.vmware.com | ||
| resources: | ||
| - virtualmachinegroups/status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just add all verbs on the marker for virtualmachineservices/status so it will be grouped in the RBAC rule above
(not sure how much sense this permission even makes, nobody does a get status :), but let's align to what we do for the other vmop resources)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated virtualmachineservices/status with all markers so it could be grouped.
Question: Seems CAPV has permission it doesn't truly use. Should we keep the permission minimal just as what controller used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to open a follow-up issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed an issue for tracking: #3681
| return err | ||
| } | ||
|
|
||
| if feature.Gates.Enabled(feature.NamespaceScopedZones) && feature.Gates.Enabled(feature.NodeAutoPlacement) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with not adding additional validation here
(I think the validation here would mean we would need an additional validating webhook on the Cluster object)
| Named("virtualmachinegroup"). | ||
| Watches( | ||
| &vmoprv1.VirtualMachineGroup{}, | ||
| handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}), | |
| handler.EnqueueRequestForOwner(mgr.GetScheme(), reconciler.Client.RESTMapper(), &clusterv1.Cluster{}), | |
| ctrlbldr.WithPredicates(predicates.ResourceIsChanged(mgr.GetScheme(), predicateLog)), |
We have not been using this in CAPV up until now, but let's use it here to already improve this new reconciler (I have a task to follow-up for others soon'ish).
Per default the reconciler will get resyncs from all informers. i.e. every 10m everything gets reconciled because of the Cluster resync and additionally every 10m everything gets reconciled because of the VirtualMachineGroup resync.
In core CAPI we drop events from resyncs of secondary watches to avoid this issue
(it's not needed for VSphereMachine below as we already drop all updates)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. This help to reduce the resync noices.
| vmg := &vmoprv1.VirtualMachineGroup{} | ||
| err := r.Client.Get(ctx, apitypes.NamespacedName{Namespace: vSphereMachine.Namespace, Name: clusterName}, vmg) | ||
|
|
||
| if err != nil { | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should not have this kind of logic in a mapper.
Let's deal with this only in the controller.
It's much more obvious there and the cache here can be out of sync.
Let's then use vSphereMachine.Namespace and clusterName for the request below
(please also update the godoc of this func)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make senses. Fixed it and also added predict in watch for VSphereMachine to only handle VSphereMachines which have MachineDeployment label.
| // reconciliationDelay is the delay time for requeueAfter. | ||
| reconciliationDelay = 10 * time.Second | ||
| // ZoneAnnotationPrefix is the prefix used for placement decision annotations which will be set on VirtualMachineGroup. | ||
| ZoneAnnotationPrefix = "zone.cluster.x-k8s.io" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to namespace this to CAPV as it's not a Cluster API wide standard annotation
(TBD with @fabriziopandini what exactly we should use instead)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed, let's use zone.vmware.infrastructure.cluster.x-k8s.io
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review only.
A few themes are emerging and it will be nice if they are addressed across the changeset:
- we should add godoc keeping track of why we are doing things in a specific way, because this will help future us when doing maintenance
- we should use consistently machineDeployment instead of nodePool, failureDomain instead of zone (I know there is tech debt, but we don't have to add on top, possibly improve)
- logs should align to guidelines
| vmOperatorVM = typedModified | ||
| } | ||
|
|
||
| if affinityInfo != nil && affinityInfo.affinitySpec != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to document why we are doing stuff, because this is the background that will be most valuable for future maintenance, eg.
| if affinityInfo != nil && affinityInfo.affinitySpec != nil { | |
| // Set Affinity and Group info on a VM only on creation, | |
| // because, changing those info after placement won't have any effect. | |
| if affinityInfo != nil && affinityInfo.affinitySpec != nil { |
I would also probably revert the if, to check "if not set" (set on creation) first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments for clarification that affinity only impact VM locations during day-1.
I would also probably revert the if, to check "if not set" (set on creation) first
Could you help clarify this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to improve your comment to prevent confusion between the comment taking about when the anti affinity rules have effect and the code always setting those rules
| type affinityInfo struct { | ||
| affinitySpec *vmoprv1.AffinitySpec | ||
| vmGroupName string | ||
| failureDomain *string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we should we need a pointer here? (If i'm not wrong there is no difference between this being nil or empty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using failureDomain string works here.
CAPV is using failureDomain * stringhttps://github.com/kubernetes-sigs/cluster-api-provider-vsphere/blob/main/apis/v1beta1/vspheremachine_types.go#L117 but CAPI is using failureDomain string, how about we keep it consistent in CAPV? but I don't have strong opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let use string here, it is a private struct and there are no direct relations with the API types (also API types cannot be changed, it is a breaking change)
|
|
||
| // affinityInfo is an internal to store VM affinity information. | ||
| type affinityInfo struct { | ||
| affinitySpec *vmoprv1.AffinitySpec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we need a pointer here? looking at usage, whenever affinityInfo is set, also affinitySpec is set
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, whenever we set affinityInfo, we'll always set affinitySpec. Seems only checking affinityInfo != nil is fine.
However, since AffinitySpec contains lots of rules and arrays of MachineDeployments. Is it better to keep it *vmoprv1.AffinitySpec for less Mem usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer readability to micro optimization (the struct is used only locally)
|
|
||
| // Calculate expected Machines of all MachineDeployments. | ||
| // CAPV retrieves placement decisions from the VirtualMachineGroup to guide | ||
| // day-2 VM placement. At least one VM is required for each MachineDeployment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least one VM is required for each MachineDeployment.
I'm wondering if this assumption holds when using autoscaler from zero... 🤔
cc @sbueringer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this check only happens when VMG is not created yet, so for day-2 scale from zero, it won't go into this case.
In addition for day-1, we'll block create Cluster with MD.replica == 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For day-2, since we'll get vmg successfully in line , so we won't run into this line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it means that a user can't create a cluster with automatic node placement and a MachineDeployment configured with autoscaler to zero enabled?
If this is the case, please add a comment
| // Wait for all intended VSphereMachines corresponding to MachineDeployment to exist only during initial Cluster creation. | ||
| current := int32(len(current)) | ||
| if current < expected { | ||
| log.Info("current VSphereMachines do not match expected", "Expected:", expected, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this message start with Waiting for...
bab7415 to
1524d87
Compare
|
@sbueringer @fabriziopandini BTW, I'm still improving logging and godoc incrementally, feel free to point out if anything missing. |
| Name: cluster.Name, | ||
| } | ||
|
|
||
| if err := r.Client.Get(ctx, *key, desiredVMG); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if err := r.Client.Get(ctx, *key, desiredVMG); err != nil { | |
| // If the VM group does not exist yet | |
| if err := r.Client.Get(ctx, *key, desiredVMG); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added comments.
| } | ||
| mdNames := []string{} | ||
| for _, machineDeployment := range machineDeployments.Items { | ||
| if machineDeployment.Spec.Template.Spec.FailureDomain == "" && machineDeployment.Name != mdName { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct to not consider machine deployments without FailureDomain?
As a user, I would like that anti affinity considers all the MachineDeployments, not only a subset (only exception should be the MachineDeployment the machine belongs to)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @sbueringer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because with current VMOP design, MachineDeployment with failureDomain specified will skip placement process. So I think this is fine.
| client.MatchingLabels{clusterv1.ClusterNameLabel: supervisorMachineCtx.Cluster.Name}); err != nil { | ||
| return false, err | ||
| } | ||
| mdNames := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be
| mdNames := []string{} | |
| otherMDNames := []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
| // Set the zone label using the annotation of the per-md zone mapping from VMG. | ||
| // This is for new VMs created during day-2 operations in VC 9.1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
| return true, nil | ||
| } | ||
|
|
||
| affInfo = affinityInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
| } | ||
|
|
||
| // If ControlPlane haven't initialized, requeue it since CAPV will only start to reconcile VSphereMachines of | ||
| // MachineDeployment until ControlPlane is initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be
| // MachineDeployment until ControlPlane is initialized. | |
| // MachineDeployment after ControlPlane is initialized. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
|
|
||
| // Wait for all intended VSphereMachines corresponding to MachineDeployment to exist only during initial Cluster creation. | ||
| current := int32(len(current)) | ||
| if current < expected { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
| // Set the BootOrder spec as the | ||
| desiredVMG.Spec.BootOrder = []vmoprv1.VirtualMachineGroupBootOrderGroup{ | ||
| { | ||
| Members: members, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
|
|
||
| // Check if this VM belongs to any of our target MachineDeployments. | ||
| // Annotation format is "zone.cluster.x-k8s.io/{machine-deployment-name}". | ||
| for _, md := range machineDeployments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
|
|
||
| // GenerateVirtualMachineName generates the name of a VirtualMachine based on the naming strategy. | ||
| // Duplicated this logic from pkg/services/vmoperator/vmopmachine.go. | ||
| func GenerateVirtualMachineName(machineName string, namingStrategy *vmwarev1.VirtualMachineNamingStrategy) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems not yet addressed
3b5ec17 to
ea432ed
Compare
- Refine VMG controller watch - Handle race conditions in VMG controller by gating member update - Refine data struct for VM Affinity config - Refine UT - Refine naming, logging, godoc - Miscellaneous Signed-off-by: Gong Zhang <[email protected]>
ea432ed to
dc441f6
Compare
|
@zhanggbj: The following test failed, say
Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What this PR does / why we need it:
Support Node Auto Placement and Node AF/AAF
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #