Skip to content

Conversation

@ilstam
Copy link
Contributor

@ilstam ilstam commented Jan 7, 2026

Changes

These patches address the following in the VM shutdown path:

  • Duplicate calls to terminate and join the vCPU threads and duplicate messages in the logs
  • KVM_EXIT_SHUTDOWN and KVM_EXIT_HLT being treated as successful terminations, but the first one is an error condition that signals a triple-fault on x86 and the second one we should never get since we use the in-kernel LAPIC
  • Code comments that explain the shutdown path incorrectly

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@codecov
Copy link

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.23%. Comparing base (89702a7) to head (2cec293).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/lib.rs 83.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5611      +/-   ##
==========================================
- Coverage   83.23%   83.23%   -0.01%     
==========================================
  Files         277      277              
  Lines       29262    29258       -4     
==========================================
- Hits        24357    24353       -4     
  Misses       4905     4905              
Flag Coverage Δ
5.10-m5n.metal 83.56% <83.33%> (-0.01%) ⬇️
5.10-m6a.metal 82.91% <83.33%> (+<0.01%) ⬆️
5.10-m6g.metal 80.18% <83.33%> (-0.01%) ⬇️
5.10-m6i.metal 83.57% <83.33%> (-0.01%) ⬇️
5.10-m7a.metal-48xl 82.90% <83.33%> (+<0.01%) ⬆️
5.10-m7g.metal 80.18% <83.33%> (-0.01%) ⬇️
5.10-m7i.metal-24xl 83.54% <83.33%> (+<0.01%) ⬆️
5.10-m7i.metal-48xl 83.54% <83.33%> (-0.01%) ⬇️
5.10-m8g.metal-24xl 80.18% <83.33%> (-0.01%) ⬇️
5.10-m8g.metal-48xl 80.18% <83.33%> (-0.01%) ⬇️
6.1-m5n.metal 83.59% <83.33%> (-0.01%) ⬇️
6.1-m6a.metal 82.93% <83.33%> (-0.01%) ⬇️
6.1-m6g.metal 80.17% <83.33%> (-0.02%) ⬇️
6.1-m6i.metal 83.59% <83.33%> (-0.01%) ⬇️
6.1-m7a.metal-48xl 82.92% <83.33%> (-0.01%) ⬇️
6.1-m7g.metal 80.18% <83.33%> (-0.01%) ⬇️
6.1-m7i.metal-24xl 83.61% <83.33%> (+<0.01%) ⬆️
6.1-m7i.metal-48xl 83.61% <83.33%> (-0.01%) ⬇️
6.1-m8g.metal-24xl 80.17% <83.33%> (-0.01%) ⬇️
6.1-m8g.metal-48xl 80.18% <83.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Commit 79c5f79 ("Shutdown VCPU threads so they can
thread::join()") introduced a Vmm::stop() function that should run in
the main VMM thread when a vCPU thread writes to the vcpus_exit_evt
eventfd. Vmm::stop() sends a termination event to the vCPU threads and
joins them. That commit claims that there was a cyclical dependency
where vCPU objects referenced the Vmm objects and therefore Vmm::stop()
could not be called from Vmm:drop().

Later, commit 1d45fcc ("vmm: fix drop logic for Vmm") introduced a
second call to Vmm::stop(), this time from Vmm::drop().

As a result, when killing the VMM today (by issuing a reboot in the
guest) Vmm::stop() is called twice and the "Vmm is stopping" message
appears twice in the firecracker logs.

Additionally, the documentation in Vmm:stop() is incorrect. Not all
teardown paths call vcpu.exit(). In fact, the most common teardown path
which is the guest asking for a CPU reset via the i8042 controller
writes to the corresponding eventfd directly and vcpu.exit() is never
called.

Today, the vCPU threads do not hold references to the Vmm object and
therefore it's fine to join the vCPU threads in Vmm::drop().

Eliminate the double call to Vmm::stop() (and the duplicate log message)
and move the vCPU thread termination and join logic to Vmm::drop().
Vmm::stop() now simply sets Vmm::shutdown_exit_code which causes the VMM
to break out of its main event loop and start the termination process.
Additionally, remove part of the documentation that was already
incorrect or becomes incorrect after this change.

Signed-off-by: Ilias Stamatis <[email protected]>
@JackThomson2 JackThomson2 added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Jan 8, 2026
This is almost a pure revert of commit 3a9a1ac ("exit with success
code on certain KVM_EXIT events") which added code that treats
KVM_EXIT_SHUTDOWN and KVM_EXIT_HLT as successful VM terminations.

KVM_EXIT_SHUTDOWN is an exit code that KVM uses when an x86 CPU triple
faults.

KVM_EXIT_HLT is the exit code that KVM uses when the guest executes a
HALT x86 instruction and KVM doesn't emulate the irqchip. Since we're
using the in-kernel irqchip we should never see a KVM_EXIT_HLT exit, as
HALT instructions are emulated by KVM and do not cause userspace exits.

Do not return Ok(VcpuEmulation::Stopped) for these exit types since that
ends up propagating an FcExitCode::Ok code to the main thread, even
though these are abnormal terminations (especially the triple-fault
one).

Remove special handling for these x86-specific exit reasons and treat
them as any other unexpected exit reason.

Also, replace an incorrect comment that says that vCPUs exit with
KVM_EXIT_SHUTDOWN or KVM_EXIT_HLT when the guest issues a reboot. On x86
the guest asks for a CPU reset via the i8042 controller which
Firecracker intercepts and kills the VM. On ARM KVM exits to userspace
with the reason KVM_EXIT_SYSTEM_EVENT (which we already handle
correctly).

Since we're in the neighbourhood get rid of a stale (8 year old) TODO
comment questioning whether we should kill the VM when we get an
unexpected KVM exit reason. Terminating the VM as we have been doing is
completely reasonable.

Fixes: 3a9a1ac ("exit with success code on certain KVM_EXIT events")
Signed-off-by: Ilias Stamatis <[email protected]>
Comment on lines +691 to +694
assert_eq!(
format!("{:?}", res.unwrap_err()),
format!("{:?}", EmulationError::UnhandledKvmExit("Hlt".to_string()))
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace this by:

        assert!(matches!(
            res,
            Err(EmulationError::UnhandledKvmExit(s)) if s == "Hlt",
        ));

to avoid nasty conversion to strings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, can do. Even though everywhere else in the file to_string() is used..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants