Skip to content

Conversation

@akutz
Copy link
Collaborator

@akutz akutz commented Oct 24, 2025

What does this PR do, and why is it needed?

This patch further restricts the async watcher from monitoring guest.net due to the same churn in VKS nodes that caused guest.ipStack to be so busy. This is okay since the summary.guest will still trigger a reconcile when a VM gets its primary IP.

Additionally, this patch also introduces multiple priority queues for a VM's lifecycle. When a VM is going to be reconciled, it will now fall into one of the following buckets:

- priorityCreating            int = 100
- priorityPowerStateChange    int = 90
- priorityWaitingForIP        int = 90
- priorityDeleting            int = 80
- priorityWaitingForDiskPromo int = 70
- priorityLow                 int = handler.LowPriority // 0

Thus, it no longer matters if async signal is sending thousands of VMs a minute, because VMs that are being created, deleted, waiting on a power state change, waiting for an IP, or waiting for disk promotion will all be moved to the head of the line.

Finally, the VM watcher has been updated to cache VM properties for an hour and will not send async signal updates if a VM's properties have not changed. The local property cache is flushed once an hour or when the pod restarts.

Which issue(s) is/are addressed by this PR? (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes NA

Are there any special notes for your reviewer:

This has been validated on a real test environment.

image
$ kubectl -n vmware-system-vmop get pods --no-headers | \
  grep manager | \
  awk '{print $1}' | \
  xargs kubectl -n vmware-system-vmop logs -cmanager -f | \
  grep 'vmqueue'
I1024 15:45:09.108681       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns112/vksns112-c112-np2-worker-qjjvh-z7pfp-c8lx7"
I1024 15:45:11.199167       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns190/vksns190-c190-hh8jj-x9x76"
I1024 15:45:20.711724       1 priority.go:175] "Adding to priority queue for event" logger="vmqueue" eventType="create" priority=100 item="vksns1/akutz-vm-1"
I1024 15:45:27.288599       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=100 item="vksns1/akutz-vm-1"
I1024 15:45:27.518249       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=100 item="vksns1/akutz-vm-1"
I1024 15:45:36.400663       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:45:37.697839       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns416/vksns416-c416-q5n8m-twv9m"
I1024 15:45:40.390469       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns135/vksns135-c135-np3-worker-57hbb-jrlcn-nwggp"
I1024 15:45:41.004367       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns254/vksns254-c254-np1-worker-xx2w5-nvljh-kzldd"
I1024 15:45:43.686486       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns30/vksns30-c30-np1-worker-9djs4-8bwgr-md2bb"
I1024 15:45:49.487937       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:45:50.087089       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:45:51.687522       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:45:52.987837       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:45:58.088454       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns167/vksns167-c167-np1-worker-cxx6f-bzgzv-x9cg8"
I1024 15:46:09.191742       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns194/vksns194-c194-fl8zd-jcj6s"
I1024 15:46:16.733268       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns308/vksns308-c308-jh4nb-hh68m"
I1024 15:46:16.874061       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns384/vksns384-c384-np2-worker-zvpdz-jdtlm-zrqtn"
I1024 15:46:17.692015       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns311/vksns311-c311-np3-worker-tf9zc-hr454-zk5hr"
I1024 15:46:17.889285       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns184/vksns184-c184-np1-worker-9hf7w-497mv-z7l8k"
I1024 15:46:20.447447       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns135/vksns135-c135-np3-worker-57hbb-jrlcn-nwggp"
I1024 15:46:29.187892       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns194/vksns194-c194-fl8zd-jcj6s"
I1024 15:46:37.818929       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns355/vksns355-c355-np3-worker-8s7pp-psnrk-x4fsk"
I1024 15:46:37.886824       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns202/vksns202-c202-np3-worker-f24lg-ttmlf-ztdq7"
I1024 15:46:39.590668       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns405/vksns405-c405-bf5tp-f4k4m"
I1024 15:46:46.387561       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=90 item="vksns1/akutz-vm-1"
I1024 15:46:50.391330       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns417/vksns417-c417-fcnbb-pwctl"
I1024 15:47:14.692755       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns149/vksns149-c149-np2-worker-frmtp-bsdfm-pz2ds"
I1024 15:47:25.290903       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-100 item="vksns412/vksns412-c412-np3-worker-2xn2p-h8c26-7b2qz"
I1024 15:47:26.591722       1 priority.go:232] "Adding to priority queue for event" logger="asyncvmqueue" eventType="generic" priority=-50 item="vksns1/akutz-vm-1"

Please add a release note if necessary:

Multiple priority queues for VM controller.

📚 Documentation preview 📚: https://vm-operator--1268.org.readthedocs.build/en/1268/

@github-actions github-actions bot added the size/XL Denotes a PR that changes 500-999 lines. label Oct 24, 2025
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 4 times, most recently from b17d640 to dbfba96 Compare October 24, 2025 18:17
@github-actions github-actions bot added size/XXL Denotes a PR that changes 1000+ lines. and removed size/XL Denotes a PR that changes 500-999 lines. labels Oct 24, 2025
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 8 times, most recently from d23a9c1 to 341e7b9 Compare October 24, 2025 19:12
Copy link
Contributor

@bryanv bryanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎊

@akutz akutz force-pushed the feature/do-not-watch-guest-net branch from 341e7b9 to a576c92 Compare October 24, 2025 19:23
@fabriziopandini
Copy link

Nice!

@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 6 times, most recently from 70dbdcf to 02b0ebe Compare October 24, 2025 20:18
@akutz akutz changed the title ✨ Do not watch guest.net / multi priority queues ✨ Cached vm props / multi priority queues Oct 24, 2025
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 4 times, most recently from 523a6b1 to f8e1a8b Compare October 24, 2025 20:37
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 8 times, most recently from 499c741 to f902e16 Compare October 24, 2025 23:14
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch 5 times, most recently from 06f758d to bb4d9ef Compare October 27, 2025 14:59
This patch further restricts the async watcher from monitoring
guest.net due to the same churn in VKS nodes that caused guest.ipStack
to be so busy. This is okay since the summary.guest will still trigger
a reconcile when a VM gets its primary IP.

Additionally, this patch also introduces multiple priority queues for a
VM's lifecycle. When a VM is going to be reconciled, it will now fall into
one of the following buckets:

- priorityLow                 int = handler.LowPriority // 0
- priorityCreating            int = 100
- priorityPowerStateChange    int = 90
- priorityWaitingForIP        int = 90
- priorityDeleting            int = 80
- priorityWaitingForDiskPromo int = 70

Thus, it no longer matters if async signal is sending thousands of VMs a minute,
because VMs that are being created, deleted, waiting on a power state change,
waiting for an IP, or waiting for disk promotion will all be moved to the head
of the line.

Finally, the VM watcher has been updated to cache VM properties for an hour
and will not send async signal updates if a VM's properties have not changed.
The local property cache is flushed once an hour or when the pod restarts.
@akutz akutz force-pushed the feature/do-not-watch-guest-net branch from bb4d9ef to 02f5a16 Compare October 27, 2025 15:13
@github-actions
Copy link

Code Coverage

Package Line Rate Health
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/clustercontentlibraryitem 67%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/contentlibraryitem 67%
github.com/vmware-tanzu/vm-operator/controllers/contentlibrary/utils 46%
github.com/vmware-tanzu/vm-operator/controllers/infra/capability/configmap 92%
github.com/vmware-tanzu/vm-operator/controllers/infra/capability/crd 100%
github.com/vmware-tanzu/vm-operator/controllers/infra/configmap 75%
github.com/vmware-tanzu/vm-operator/controllers/infra/node 77%
github.com/vmware-tanzu/vm-operator/controllers/infra/secret 76%
github.com/vmware-tanzu/vm-operator/controllers/infra/validatingwebhookconfiguration 87%
github.com/vmware-tanzu/vm-operator/controllers/infra/zone 73%
github.com/vmware-tanzu/vm-operator/controllers/storageclass 95%
github.com/vmware-tanzu/vm-operator/controllers/storagepolicyquota 98%
github.com/vmware-tanzu/vm-operator/controllers/util/encoding 73%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/storagepolicyusage 96%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/virtualmachine 67%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/volume 87%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachine/volumebatch 81%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineclass 73%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinegroup 89%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinegrouppublishrequest 88%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineimagecache 88%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinepublishrequest 82%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinereplicaset 67%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice 83%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachineservice/providers 92%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinesetresourcepolicy 81%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinesnapshot 95%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest 72%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1 72%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/conditions 88%
github.com/vmware-tanzu/vm-operator/controllers/virtualmachinewebconsolerequest/v1alpha1/patch 78%
github.com/vmware-tanzu/vm-operator/controllers/vspherepolicy/policyevaluation 93%
github.com/vmware-tanzu/vm-operator/pkg/bitmask 100%
github.com/vmware-tanzu/vm-operator/pkg/builder 94%
github.com/vmware-tanzu/vm-operator/pkg/conditions 90%
github.com/vmware-tanzu/vm-operator/pkg/config 100%
github.com/vmware-tanzu/vm-operator/pkg/config/capabilities 98%
github.com/vmware-tanzu/vm-operator/pkg/config/env 100%
github.com/vmware-tanzu/vm-operator/pkg/context 25%
github.com/vmware-tanzu/vm-operator/pkg/context/generic 100%
github.com/vmware-tanzu/vm-operator/pkg/context/operation 100%
github.com/vmware-tanzu/vm-operator/pkg/crd 75%
github.com/vmware-tanzu/vm-operator/pkg/errors 75%
github.com/vmware-tanzu/vm-operator/pkg/exit 100%
github.com/vmware-tanzu/vm-operator/pkg/log 100%
github.com/vmware-tanzu/vm-operator/pkg/mem 100%
github.com/vmware-tanzu/vm-operator/pkg/patch 78%
github.com/vmware-tanzu/vm-operator/pkg/prober 89%
github.com/vmware-tanzu/vm-operator/pkg/prober/probe 90%
github.com/vmware-tanzu/vm-operator/pkg/prober/worker 77%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere 74%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/clustermodules 73%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/config 88%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/contentlibrary 75%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/credentials 100%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/network 81%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/placement 74%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/session 50%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/storage 44%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/upgrade/virtualmachine 96%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vcenter 85%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/virtualmachine 86%
github.com/vmware-tanzu/vm-operator/pkg/providers/vsphere/vmlifecycle 71%
github.com/vmware-tanzu/vm-operator/pkg/record 87%
github.com/vmware-tanzu/vm-operator/pkg/topology 91%
github.com/vmware-tanzu/vm-operator/pkg/util 78%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit 89%
github.com/vmware-tanzu/vm-operator/pkg/util/cloudinit/validate 91%
github.com/vmware-tanzu/vm-operator/pkg/util/image 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube 94%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/cource 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/internal 100%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/proxyaddr 73%
github.com/vmware-tanzu/vm-operator/pkg/util/kube/spq 99%
github.com/vmware-tanzu/vm-operator/pkg/util/linuxprep 97%
github.com/vmware-tanzu/vm-operator/pkg/util/netplan 100%
github.com/vmware-tanzu/vm-operator/pkg/util/nil 100%
github.com/vmware-tanzu/vm-operator/pkg/util/ovfcache 75%
github.com/vmware-tanzu/vm-operator/pkg/util/ovfcache/internal 100%
github.com/vmware-tanzu/vm-operator/pkg/util/paused 100%
github.com/vmware-tanzu/vm-operator/pkg/util/ptr 100%
github.com/vmware-tanzu/vm-operator/pkg/util/resize 98%
github.com/vmware-tanzu/vm-operator/pkg/util/sysprep 98%
github.com/vmware-tanzu/vm-operator/pkg/util/vmopv1 88%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/client 66%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/library 96%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/vm 79%
github.com/vmware-tanzu/vm-operator/pkg/util/vsphere/watcher 85%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig 95%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/anno2extraconfig 100%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/bootoptions 97%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/crypto 91%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/diskpromo 100%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/policy 96%
github.com/vmware-tanzu/vm-operator/pkg/vmconfig/virtualcontroller 85%
github.com/vmware-tanzu/vm-operator/pkg/webconsolevalidation 100%
github.com/vmware-tanzu/vm-operator/services/vm-watcher 85%
github.com/vmware-tanzu/vm-operator/webhooks/common 98%
github.com/vmware-tanzu/vm-operator/webhooks/persistentvolumeclaim/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/unifiedstoragequota/validation 88%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/mutation 84%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachine/validation 95%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/mutation 62%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineclass/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinegroup/mutation 87%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinegroup/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinegrouppublishrequest/mutation 86%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinegrouppublishrequest/validation 88%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinepublishrequest/validation 93%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinereplicaset/validation 90%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/mutation 67%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachineservice/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesetresourcepolicy/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesnapshot/mutation 83%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinesnapshot/validation 89%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/v1alpha1/validation 92%
github.com/vmware-tanzu/vm-operator/webhooks/virtualmachinewebconsolerequest/validation 92%
Summary 82% (16111 / 19659)

Minimum allowed line rate is 79%

@akutz akutz merged commit aaf8b0e into vmware-tanzu:main Oct 27, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XXL Denotes a PR that changes 1000+ lines.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants