Skip to content

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Mar 31, 2025

Changes

Move the GuestMemoryMmap that keeps track of a Firecracker VM's memory regions into struct Vm, from struct Vmm, and then update some related functionality (such as various snapshot helper functions) to operate on struct Vm instead of struct Vmm.

Reason

So far, Firecracker always dealt with a single collection of homogeneous guest memory regions. However, #5108 introduces a second type of memory regions, which makes the set of memory regions heterogeneous, and other features, such as virtio-pmem support, will come with similar "special" memory regions. By moving the guest memory management into Vm, we encapsule this heterogeneity and allow incremental building of guest memory configuration.

Admittedly, there's nothing right now that would benefit from this refactor (it doesn't even reduce LoC 😢), but it will most certainly make life on the feature branch easier to not have to continuously rebase this 😅

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

@roypat roypat force-pushed the guest-mem-in-vm branch 3 times, most recently from d9de063 to 7933d66 Compare March 31, 2025 11:06
@roypat roypat marked this pull request as ready for review March 31, 2025 11:18
roypat added 14 commits March 31, 2025 13:17
The explicit calls to GuestMemoryMmap::create ended up being touched in
like 5 of the following commits, so let's reduce the churn by putting it
into a function.

Signed-off-by: Patrick Roy <[email protected]>
Have all of them operate on some sort of list of (GuestAddress, usize)
tuples, for consistency.

Signed-off-by: Patrick Roy <[email protected]>
This struct will hold all the fields that are common between the x86_64
and aarch64 definition of `struct ArchVm`. For now, this is only the
`VmFd`, but this will grow.

Signed-off-by: Patrick Roy <[email protected]>
This is to prepare for memory regions being registered to a VM
incrementally.

While we're at it, add a unit test for it. It completes in roughly 50s
on my machine, and will become more useful later once we actually allow
building mmap regions incrementally.

Signed-off-by: Patrick Roy <[email protected]>
Semantically, this is where it belong - the memory is associated with
the virtual machine, not the virtual machine manager.

But independently of this, it is needed for borrowchk reasons: In the
future, each Vm will have two guest memory regions associated with it.
One for traditional guest memory, and another that is dedicated to I/O
(swiotlb region).  Storing these two GuestMemoryMmap in `struct Vm`
allows for a function as follows:

fn io_memory(&self) -> &GuestMemoryMmap {
    self.io_memory.as_ref().unwrap_or(&self.normal_memory)
}

Such a function cannot live in `struct Vmm`, as that would mean
borrowing the guest memory implies borrowing the entire `Vmm` object
(because rustc is not intelligent enough to project split borrows past
function calls - e.g. it doesn't know the references returned by
io_memory() really only borrows the io_memory and normal_memory fields,
not everything else.). However, this means it cannot be used in
callsites where such splitting of borrows is required (e.g. everywhere
where both a guest memory reference _and_ another Vmm field are passed
by reference to a function).

The `Option`-ness of the the guest_memory field is transitory.

Signed-off-by: Patrick Roy <[email protected]>
This is in preparation of guest memory itself only getting set up after
the call to create_vmm_and_vcpus().

Signed-off-by: Patrick Roy <[email protected]>
Add `Vm::register_memory_region[s]`, which allow adding new memory
regions to a Vm incrementally (e.g. without overwriting the previously
registered regions).

To use this, have the functions in the `memory` module hand out
Vec<GuestMemoryRegion> instead of GuestMemoryMmap, because vm_memory
does not offer APIs for extracing the regions of a GuestMemoryMmap
again.

Signed-off-by: Patrick Roy <[email protected]>
Since the guest memory is now stored in `struct Vm`, we can more easily
store the guest memory state inside the VmState.

Signed-off-by: Patrick Roy <[email protected]>
Now that guest memory is tracked inside struct Vm instead of struct Vmm,
the dirty bitmap tracking functions can also start living in struct Vm.

While we're touching it anyway, update the types used to kvm memslot
numbers to be the native u32. This saves us a few ugly usize->u32 cast.

Also change the error type as to not drag in VmmError into the vm
module, where it really doesn't belong.

Signed-off-by: Patrick Roy <[email protected]>
Do this in create_snapshot() instead of snapshot_memory_to_file(), as it
was the only operation inside snapshot_memory_to_file() that required
access to a struct Vmm, opening the gates for move the rest of
snapshot_memory_to_file() into struct Vm.

Signed-off-by: Patrick Roy <[email protected]>
struct Vm encapsulates the memory, so having this function live there
makes sense. It also prepares us for the future where multiple
GuestMemoryMmap objects need to be managed.

Signed-off-by: Patrick Roy <[email protected]>
Avoid some .map_err() calls using #[from] and the auto-From::from
calling of the question mark operator.

Signed-off-by: Patrick Roy <[email protected]>
We can just immediately propagate errors to the caller using `?` syntax,
which saves us the "if (...).is_ok()" check.

Signed-off-by: Patrick Roy <[email protected]>
Otherwise it's impossible to figure out what precisely went wrong during
queue restoration.

Signed-off-by: Patrick Roy <[email protected]>
Copy link

codecov bot commented Mar 31, 2025

Codecov Report

Attention: Patch coverage is 85.94249% with 44 lines in your changes missing coverage. Please review.

Project coverage is 83.03%. Comparing base (1f1c5c0) to head (8055e0a).
Report is 18 commits behind head on main.

Files with missing lines Patch % Lines
src/vmm/src/vstate/memory.rs 77.02% 17 Missing ⚠️
src/vmm/src/persist.rs 58.33% 10 Missing ⚠️
src/vmm/src/arch/x86_64/mod.rs 75.00% 6 Missing ⚠️
src/vmm/src/arch/aarch64/mod.rs 84.00% 4 Missing ⚠️
src/vmm/src/vstate/vm.rs 97.32% 3 Missing ⚠️
src/vmm/src/resources.rs 66.66% 2 Missing ⚠️
src/vmm/src/builder.rs 94.11% 1 Missing ⚠️
src/vmm/src/lib.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5125      +/-   ##
==========================================
- Coverage   83.07%   83.03%   -0.05%     
==========================================
  Files         250      250              
  Lines       26901    26911      +10     
==========================================
- Hits        22349    22345       -4     
- Misses       4552     4566      +14     
Flag Coverage Δ
5.10-c5n.metal 83.55% <85.76%> (+0.01%) ⬆️
5.10-m5n.metal 83.55% <85.76%> (+0.01%) ⬆️
5.10-m6a.metal 82.71% <85.76%> (+<0.01%) ⬆️
5.10-m6g.metal 79.37% <85.92%> (-0.05%) ⬇️
5.10-m6i.metal 83.54% <85.76%> (+0.01%) ⬆️
5.10-m7a.metal-48xl 82.71% <85.76%> (?)
5.10-m7g.metal 79.37% <85.92%> (-0.05%) ⬇️
6.1-c5n.metal 83.60% <85.76%> (+0.01%) ⬆️
6.1-m5n.metal 83.59% <85.76%> (+0.01%) ⬆️
6.1-m6a.metal 82.76% <85.76%> (+0.01%) ⬆️
6.1-m6g.metal 79.37% <85.92%> (-0.05%) ⬇️
6.1-m6i.metal 83.59% <85.76%> (+0.01%) ⬆️
6.1-m7a.metal-48xl 82.76% <85.76%> (?)
6.1-m7g.metal 79.36% <85.92%> (-0.06%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Mar 31, 2025
@kalyazin kalyazin requested a review from Copilot April 1, 2025 13:04
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the guest memory management by moving GuestMemoryMmap from Vmm into Vm and updating all corresponding usages in production and test code. Key changes include:

  • Replacing vmm.guest_memory with vmm.vm.guest_memory() in various modules.
  • Updating test helpers and signatures in builder, architecture, and device modules to accommodate the new guest memory API.
  • Minor improvements to error message formatting in snapshot persistence.

Reviewed Changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/vmm/src/gdb/* Updated memory read/write calls to use vm.guest_memory()
src/vmm/src/devices/virtio/* Adjusted guest memory calls and test module visibility
src/vmm/src/builder.rs Updated memory initialization and snapshot restore
src/vmm/src/arch/x86_64/* Updated arch_memory_regions and guest memory handling adjustments
src/vmm/src/arch/aarch64/* Updated guest memory calls and VM state save/restore
src/vmm/benches/memory_access.rs Minor improvement in accessing the first memory region
Comments suppressed due to low confidence (3)

src/vmm/src/devices/virtio/vhost_user.rs:462

  • [nitpick] The test module's visibility has been changed from private to pub(crate). If this change is intentional for test accessibility purposes, consider adding a comment to document the rationale behind exposing the tests.
pub(crate) mod tests {

src/vmm/src/devices/virtio/persist.rs:25

  • [nitpick] The error message for QueueConstruction now includes a placeholder. Please verify that this formatting meets your intended error reporting style and matches all related formatting conventions.
/// Could not restore queue: {0}

src/vmm/benches/memory_access.rs:16

  • [nitpick] Switching from iter().next() to first() improves clarity; please ensure that this change preserves the intended ordering and behavior of the memory regions.
let ptr = memory.first().unwrap().as_ptr();

kalyazin
kalyazin previously approved these changes Apr 1, 2025
@roypat roypat requested review from JackThomson2 and bchalios April 1, 2025 13:59
JackThomson2
JackThomson2 previously approved these changes Apr 1, 2025
@bchalios
Copy link
Contributor

bchalios commented Apr 2, 2025

All in all, this is a great change I think. To answer your initial question, It definitely helps massively in case we want to build virtio-pmem at some point.

@roypat roypat dismissed stale reviews from JackThomson2 and kalyazin via d9474b4 April 2, 2025 12:44
@roypat roypat force-pushed the guest-mem-in-vm branch 4 times, most recently from 2c66a1b to 66f9d7f Compare April 2, 2025 12:56
roypat added 4 commits April 2, 2025 15:59
This allows having regions that start somewhere other than guest
physical address 0. Useful in case not all regions are allocated at
once, and each batch needs to be placed after all already allocated
regions.

Since the resulting code has some intricacies (and I actually got it
wrong the first time around), write a kani harness to validate that
we're computing the regions correctly.

On ARM, add a warning in case we hit some questionable logic around
truncation of memory sizes if they exceed architectural bounds (although
if anyone is hitting this, they're already doing something wrong anyway,
because it'd require a VM with more than 1022GiB of memory).

Signed-off-by: Patrick Roy <[email protected]>
This way the non-snapshot path doesnt have to deal with passing in
`None`.

Signed-off-by: Patrick Roy <[email protected]>
Only enabling it when a balloon device is present doesn't really gain us
anything, because the only time this feature is checked in the kernel is
as part of madvise handling - but without balloon devices we wont ever
call madvise, so it makes no difference whether we enabled the feature
or not (it is unconditionally available since linux 4.10).

Signed-off-by: Patrick Roy <[email protected]>
Dropping the VM object closes the VM FD, which can cause the test to
hang.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat force-pushed the guest-mem-in-vm branch from fb880e7 to 1fc262b Compare April 2, 2025 15:01
@roypat roypat enabled auto-merge (rebase) April 2, 2025 15:22
@roypat roypat merged commit 45fc21b into firecracker-microvm:main Apr 2, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants