-
Notifications
You must be signed in to change notification settings - Fork 2.2k
refactor: Eliminate double call to Vmm::stop() AND fix(vcpu): Don't treat KVM_EXIT_{SHUTDOWN,HLT} as successful termination #5611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5611 +/- ##
==========================================
- Coverage 83.23% 83.23% -0.01%
==========================================
Files 277 277
Lines 29262 29258 -4
==========================================
- Hits 24357 24353 -4
Misses 4905 4905
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Commit 79c5f79 ("Shutdown VCPU threads so they can thread::join()") introduced a Vmm::stop() function that should run in the main VMM thread when a vCPU thread writes to the vcpus_exit_evt eventfd. Vmm::stop() sends a termination event to the vCPU threads and joins them. That commit claims that there was a cyclical dependency where vCPU objects referenced the Vmm objects and therefore Vmm::stop() could not be called from Vmm:drop(). Later, commit 1d45fcc ("vmm: fix drop logic for Vmm") introduced a second call to Vmm::stop(), this time from Vmm::drop(). As a result, when killing the VMM today (by issuing a reboot in the guest) Vmm::stop() is called twice and the "Vmm is stopping" message appears twice in the firecracker logs. Additionally, the documentation in Vmm:stop() is incorrect. Not all teardown paths call vcpu.exit(). In fact, the most common teardown path which is the guest asking for a CPU reset via the i8042 controller writes to the corresponding eventfd directly and vcpu.exit() is never called. Today, the vCPU threads do not hold references to the Vmm object and therefore it's fine to join the vCPU threads in Vmm::drop(). Eliminate the double call to Vmm::stop() (and the duplicate log message) and move the vCPU thread termination and join logic to Vmm::drop(). Vmm::stop() now simply sets Vmm::shutdown_exit_code which causes the VMM to break out of its main event loop and start the termination process. Additionally, remove part of the documentation that was already incorrect or becomes incorrect after this change. Signed-off-by: Ilias Stamatis <[email protected]>
This is almost a pure revert of commit 3a9a1ac ("exit with success code on certain KVM_EXIT events") which added code that treats KVM_EXIT_SHUTDOWN and KVM_EXIT_HLT as successful VM terminations. KVM_EXIT_SHUTDOWN is an exit code that KVM uses when an x86 CPU triple faults. KVM_EXIT_HLT is the exit code that KVM uses when the guest executes a HALT x86 instruction and KVM doesn't emulate the irqchip. Since we're using the in-kernel irqchip we should never see a KVM_EXIT_HLT exit, as HALT instructions are emulated by KVM and do not cause userspace exits. Do not return Ok(VcpuEmulation::Stopped) for these exit types since that ends up propagating an FcExitCode::Ok code to the main thread, even though these are abnormal terminations (especially the triple-fault one). Remove special handling for these x86-specific exit reasons and treat them as any other unexpected exit reason. Also, replace an incorrect comment that says that vCPUs exit with KVM_EXIT_SHUTDOWN or KVM_EXIT_HLT when the guest issues a reboot. On x86 the guest asks for a CPU reset via the i8042 controller which Firecracker intercepts and kills the VM. On ARM KVM exits to userspace with the reason KVM_EXIT_SYSTEM_EVENT (which we already handle correctly). Since we're in the neighbourhood get rid of a stale (8 year old) TODO comment questioning whether we should kill the VM when we get an unexpected KVM exit reason. Terminating the VM as we have been doing is completely reasonable. Fixes: 3a9a1ac ("exit with success code on certain KVM_EXIT events") Signed-off-by: Ilias Stamatis <[email protected]>
| assert_eq!( | ||
| format!("{:?}", res.unwrap_err()), | ||
| format!("{:?}", EmulationError::UnhandledKvmExit("Hlt".to_string())) | ||
| ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace this by:
assert!(matches!(
res,
Err(EmulationError::UnhandledKvmExit(s)) if s == "Hlt",
));
to avoid nasty conversion to strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, can do. Even though everywhere else in the file to_string() is used..
Changes
These patches address the following in the VM shutdown path:
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md.Runbook for Firecracker API changes.
integration tests.
TODO.rust-vmm.