Skip to content

Conversation

@Manciukic
Copy link
Contributor

Changes

...

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkbuild --all to verify that the PR passes
    build checks on all supported architectures.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

Add missing pmem field from vm config response.

Signed-off-by: Riccardo Mancini <[email protected]>
We mistakenly added it as `output_path` rather than `serial_out_path`.

Signed-off-by: Riccardo Mancini <[email protected]>
The custom CPU templates were just a plain object with no definition of
their fields. This patch adds specific definition for all its fields.

Signed-off-by: Riccardo Mancini <[email protected]>
To implement a strict API validation, let's annotate when extra fields
are actually expected.

Signed-off-by: Riccardo Mancini <[email protected]>
Validate that all requests we do to the API are conformant to the
swagger, and that all responses are also conformant.
This check is strict, meaning no extra fields are allowed to catch
problems where there is a typo in the swagger.
We only check successful requests as we don't want to fail when we try
to send a bad request on purpose. If the request is successful, then it
means the schema should have been valid.

Signed-off-by: Riccardo Mancini <[email protected]>
Automatically generate bindings for virtio-mem.

Signed-off-by: Riccardo Mancini <[email protected]>
Create the new module for the virtio-mem device.

Signed-off-by: Riccardo Mancini <[email protected]>
Allow to configure the virtio-mem device from the VmmConfig and the PUT
API to /hotplug/memory.

Signed-off-by: Riccardo Mancini <[email protected]>
Test the freshly added PUT API to /hotplug/memory.

Signed-off-by: Riccardo Mancini <[email protected]>
Add a dummy virtio-mem device that is detected by the guest driver.
The device is configured with total_size, block_size, and slot_size, and
uses an allocated after the MMIO64 memory zone.

Signed-off-by: Riccardo Mancini <[email protected]>
Check that the driver correctly detects the virtio-mem device, with the
correct parameters.

Signed-off-by: Riccardo Mancini <[email protected]>
Add support for GET /hotplug/memory that returns the current status of
the virtio-mem device.

This API can only be called after boot.

Signed-off-by: Riccardo Mancini <[email protected]>
Add new API to swagger and device-api.md

Signed-off-by: Riccardo Mancini <[email protected]>
Avoid multiple conversions back and forth from MiB to bytes by just
storing as MiB.

Signed-off-by: Riccardo Mancini <[email protected]>
We should match the exact reason for the RuntimeError to ensure we're
failing for what we're expecting.

Signed-off-by: Riccardo Mancini <[email protected]>
It's better to be explicit on the conversion we're doing as u64 to usize
is always safe in our platforms.

Signed-off-by: Riccardo Mancini <[email protected]>
This type annotation is redoundant, so we can remove it.

Signed-off-by: Riccardo Mancini <[email protected]>
Implements basic snapshot/restore functionality for the dummy virtio-mem
device.

Signed-off-by: Riccardo Mancini <[email protected]>
Wire support for virtio-mem metrics, adding a few basic metrics: queue
events, queue event fails, activation fails.

Signed-off-by: Riccardo Mancini <[email protected]>
Allocate the memory that will be used for hotplugging. Initially, this
memory will be registered with KVM, but that will change later when we
add dynamic slot support.

Signed-off-by: Riccardo Mancini <[email protected]>
Wire up PATCH requests with the virtio-mem device.
All the validation is performed in the device, but the actual operation
is not yet implemented.

Signed-off-by: Riccardo Mancini <[email protected]>
Add entry for the patch API in Swagger and in the docs.

Signed-off-by: Riccardo Mancini <[email protected]>
Test that the new PATCH API behaves as expected.
Also updates expected metrics and fixes memory monitor to account for
hotplugging.

Signed-off-by: Riccardo Mancini <[email protected]>
Parse virtio requests over the queue and always ack them. Following
commits will add the state management inside the device.

Signed-off-by: Riccardo Mancini <[email protected]>
This commit adds block state management and implements the virtio
requests for the virtio-mem device.

Block state is tracked using a BitVec, each bit representing a single
block.

Plug/Unplug requests are validated before being executed to verify the
range is valid (aligned and within range), and that all blocks in range
are unplugged/plugged, as per the virtio spec.

UplugAll is the only request where usable_region_size can be lowered.

This commit is missing the dynamic KVM slot management which will be
added later.

Signed-off-by: Riccardo Mancini <[email protected]>
Adds unit tests using VirtioTestHelper to verify correct functioning of
the new device.

Signed-off-by: Riccardo Mancini <[email protected]>
Add the virtio-mem device metrics to the integ test validation.

Signed-off-by: Riccardo Mancini <[email protected]>
If the handler receives a UFFD remove event, it currently stores the PFN
and will reply with a zero page whenever it receives a pagefault event
for that page.
This works well with 4k pages, but zeropage is not supported on
hugepages. In order to support hugepages, let's just unregister from
UFFD whenever we get a remove event. By doing so, the handler won't
receive a notification for the removed page, and the VM will get a new
zero page from the kernel.

Signed-off-by: Riccardo Mancini <[email protected]>
This moves the logic to measure RSS to framework.utils and adds a logic
to also include huge pages in the measurement.

Furthermore, this also adds caching for the firecracker_pid, as well as
a new property to get the corresponding psutil.Process.

Signed-off-by: Riccardo Mancini <[email protected]>
Move the logic to get the MemAvailable from /proc/meminfo inside the
guest to a new guest_stats module in the test framework. This provides a
new class MeminfoGuest that can be used to retrieve this information
(and more!).

Signed-off-by: Riccardo Mancini <[email protected]>
Add integration tests for the new device:
 - check that the device is detected
 - check that hotplugging and unplugging works
 - check that memory can be used after hotplugging
 - check that memory is freed on hotunplug
 - check different config combinations
 - check different uvm types
 - check that contents are preserved across snapshot-restore

Signed-off-by: Riccardo Mancini <[email protected]>
Since these tests need to be run on an ag=1 host, move them under the
"performance" folder.

Signed-off-by: Riccardo Mancini <[email protected]>
These tests add unit test coverage to the builder.rs and vm.rs files
which where previously untested in the memory hotplug case.

Signed-off-by: Riccardo Mancini <[email protected]>
Add a slot_cnt parameter to next_kvm_slot. This will be used to allocate
multiple slots for a slotted hotpluggable region.

Signed-off-by: Riccardo Mancini <[email protected]>
To avoid boilerplate code in multiple places, let's just define it once
and use it everywhere.

Signed-off-by: Riccardo Mancini <[email protected]>
In preparation to adding support for multiple memory slots, refactor the
mincore_bitmap function to accept a pointer and length rather than an
entire memory region object.

Signed-off-by: Riccardo Mancini <[email protected]>
A single GuestMemoryRegion can be split into multiple KVM slots. This is
used for Hotplug type regions where we can dynamically remove access to
the region from the guest.

Signed-off-by: Riccardo Mancini <[email protected]>
Dynamically plug/unplug KVM slots when any/all of the blocks are
plugged/unplugged.

This prevents the guest from accessing unplugged memory.
However, this doesn't yet prevent the device emulation from accessing
the slots.

Signed-off-by: Riccardo Mancini <[email protected]>
Add a performance test that measures the latency to hot(un)plug
different amounts of memory.

Signed-off-by: Riccardo Mancini <[email protected]>
Add the memory hotplug tests to buildkite.

Signed-off-by: Riccardo Mancini <[email protected]>
This prevents the device emulation to be tricked into accessing
unplugged memory ranges. If a malicious driver tries to do so, the VMM
will crash with a memory error.

Signed-off-by: Riccardo Mancini <[email protected]>
These two functions were doing a similar thing. Let's extend the
build_from_snapshot to support uffd and drop the newly introduced
clone_vm.

Signed-off-by: Riccardo Mancini <[email protected]>
Extend the tests to also check that everything works with diff snapshots
(dirty page tracking and mincore).

Signed-off-by: Riccardo Mancini <[email protected]>
Add a test that verifies that incremental snapshots work as expected.
Each generation will plug more memory and verify that the contents in
memory from previous generations are persisted.

Signed-off-by: Riccardo Mancini <[email protected]>
When the fuzzer generates an invalid bitvec state, the serde code
crashes with an assertion error.
To avoid these crashes, just store it as a plain Vec<bool> as we don't
expect these vecs to be too big (for a 10GB hotpluggable area with 2MiB
blocks it would require 5kB for the Vec<bool> compared to 640B with
BitVec).

Signed-off-by: Riccardo Mancini <[email protected]>
Return the error up the stack instead of unwrapping it.

Note that on PCI this error is unwrapped for every device so there's no
change there.

Signed-off-by: Riccardo Mancini <[email protected]>
When the region state is invalid or corrupted (like when generated by
the fuzzer), it is possible that a DRAM slot is unplugged, leading to
segfaults when accessing guest memory (ie from vmgenid device).
To avoid these crashes, validate the region state and allow the DRAM
region (not hot-pluggable) to only contain one plugged slot.

Signed-off-by: Riccardo Mancini <[email protected]>
Sometimes on m6a the uffd test fails to write 1.2GB within 10s. Bump the
timeout to 30s.

Signed-off-by: Riccardo Mancini <[email protected]>
The previously set 256MB were not enough for hotplugging 16GB of memory.
This is because the kernel needs 64B for every 4kB page, meaning 262MB
for 16GB.

Signed-off-by: Riccardo Mancini <[email protected]>
@codecov
Copy link

codecov bot commented Nov 21, 2025

Codecov Report

❌ Patch coverage is 90.66116% with 113 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.36%. Comparing base (2b05435) to head (3699d5b).

Files with missing lines Patch % Lines
src/vmm/src/devices/virtio/mem/device.rs 94.41% 25 Missing ⚠️
src/vmm/src/rpc_interface.rs 0.00% 23 Missing ⚠️
src/vmm/src/lib.rs 0.00% 13 Missing ⚠️
src/vmm/src/devices/virtio/mem/event_handler.rs 81.48% 10 Missing ⚠️
src/vmm/src/vstate/memory.rs 95.09% 10 Missing ⚠️
src/vmm/src/builder.rs 85.71% 9 Missing ⚠️
src/vmm/src/resources.rs 59.09% 9 Missing ⚠️
src/firecracker/src/api_server/parsed_request.rs 0.00% 7 Missing ⚠️
src/vmm/src/devices/virtio/mem/request.rs 96.29% 4 Missing ⚠️
src/vmm/src/devices/virtio/mem/persist.rs 95.34% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5529      +/-   ##
==========================================
+ Coverage   82.88%   83.36%   +0.48%     
==========================================
  Files         270      277       +7     
  Lines       27784    28889    +1105     
==========================================
+ Hits        23028    24084    +1056     
- Misses       4756     4805      +49     
Flag Coverage Δ
5.10-m5n.metal 83.59% <90.66%> (+0.52%) ⬆️
5.10-m6a.metal 82.92% <90.66%> (+0.58%) ⬆️
5.10-m6g.metal 80.29% <90.00%> (+0.65%) ⬆️
5.10-m6i.metal 83.60% <90.66%> (+0.54%) ⬆️
5.10-m7a.metal-48xl 82.92% <90.66%> (+0.59%) ⬆️
5.10-m7g.metal 80.29% <90.00%> (+0.65%) ⬆️
5.10-m7i.metal-24xl 83.56% <90.66%> (+0.52%) ⬆️
5.10-m7i.metal-48xl 83.57% <90.66%> (+0.53%) ⬆️
5.10-m8g.metal-24xl 80.29% <90.00%> (+0.64%) ⬆️
5.10-m8g.metal-48xl 80.29% <90.00%> (+0.64%) ⬆️
6.1-m5n.metal 83.62% <90.66%> (+0.53%) ⬆️
6.1-m6a.metal 82.95% <90.66%> (+0.58%) ⬆️
6.1-m6g.metal 80.29% <90.00%> (+0.65%) ⬆️
6.1-m6i.metal 83.62% <90.66%> (+0.53%) ⬆️
6.1-m7a.metal-48xl 82.94% <90.66%> (+0.58%) ⬆️
6.1-m7g.metal 80.29% <90.00%> (+0.64%) ⬆️
6.1-m7i.metal-24xl 83.63% <90.66%> (+0.53%) ⬆️
6.1-m7i.metal-48xl 83.63% <90.66%> (+0.51%) ⬆️
6.1-m8g.metal-24xl 80.28% <90.00%> (+0.64%) ⬆️
6.1-m8g.metal-48xl 80.29% <90.00%> (+0.64%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant