Skip to content

Conversation

@PhilippMatthes
Copy link
Member

@PhilippMatthes PhilippMatthes commented Jan 8, 2026

With this change, we now reconcile the hypervisor resource if any libvirt domain lifecycle events occur. This keeps the hypervisor allocation up to date and simplifies how we manage the domain list.

We can only subscribe to libvirt events once. This means, we have to pull out the subscription logic from the runMigrationListener function in libvirt_events.go. We provide a new WatchDomainChanges interface function which can be used to subscribe to libvirt events, and reuse it for the existing migration watching and lifecycle logging. Furthermore, the distribution of libvirt events to the listeners is handled by a new event loop. Then, we also separate the logic unrelated to "migration watching" out, improving the overall structure and clarity of the code.

Finally, we make the hypervisor controller reconcile as soon as we obtain a new domain lifecycle event. This is done with the proper mechanism provided by the controller-runtime client library: raw channel sources. The libvirt event subscription is wired together with the controller-runtime event channel on startup of the manager.

@PhilippMatthes PhilippMatthes force-pushed the reconcile-event-driven branch 2 times, most recently from 6a647bd to 7d886e3 Compare January 8, 2026 10:55
@PhilippMatthes PhilippMatthes changed the title Refactor libvirt event handling Refactor libvirt event handling + reconcile when domains change Jan 8, 2026
@PhilippMatthes PhilippMatthes force-pushed the reconcile-event-driven branch from 61d380a to 04e1c00 Compare January 8, 2026 12:37
@PhilippMatthes PhilippMatthes marked this pull request as ready for review January 8, 2026 12:39
@notandy notandy requested a review from mchristianl January 9, 2026 15:50
Copy link

@mchristianl mchristianl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've marked

  • a data race on shared libvirt connection
  • race condition on event handler map
  • reconnect is logged but not (yet) implemented?
  • busy loop in runEventLoop
  • maybe unbuffered channels are not intended here

Again, I think that concurrency and shared access are a bit spread over different locations in the source code. Therefore I would suggest to test this in a simpler setting first but let's discuss this.

Also, as we've already discussed I don't trust the libvirt socket and

  • currently we won't get lifecycle events on migration (the qemu driver has implemented them) so pure event-based state tracking is blocked by this feature
  • at least restarting the libvirt socket should be supported (but I may be mistaken to just have overlooked the logic if present)

@PhilippMatthes
Copy link
Member Author

^ Added a goroutine which triggers a reconcile every minute.

@github-actions
Copy link

Merging this branch changes the coverage (1 decrease, 2 increase)

Impacted Packages Coverage Δ 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller 39.76% (+8.94%) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt 39.43% (+4.65%) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo 19.23% (-2.51%) 👎

Coverage by file

Changed files (no unit tests)

Changed File Coverage Δ Total Covered Missed 🤖
github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller.go 57.89% (+10.02%) 114 (+20) 66 (+21) 48 (-1) 🎉
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client.go 19.23% (-2.51%) 26 (+3) 5 21 (+3) 👎
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface.go 0.00% (ø) 0 0 0
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/interface_mock.go 37.50% (+4.17%) 48 (+12) 18 (+6) 30 (+6) 👍
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt.go 70.92% (-5.63%) 196 (+51) 139 (+28) 57 (+23) 👎
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt_events.go 0.00% (ø) 149 (-36) 0 149 (-36)
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt_status_thread.go 0.00% (ø) 0 (-19) 0 0 (-19)
github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/utils.go 32.14% (-40.27%) 28 (-1) 9 (-12) 19 (+11) 💀 💀 💀 💀

Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code.

Changed unit test files

  • github.com/cobaltcore-dev/kvm-node-agent/internal/controller/hypervisor_controller_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/dominfo/client_test.go
  • github.com/cobaltcore-dev/kvm-node-agent/internal/libvirt/libvirt_test.go

@PhilippMatthes PhilippMatthes merged commit 4c17091 into main Jan 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants