[WIP] rebase of feature/secret-hiding onto main #5379

roypat · 2025-08-13T15:56:14Z

Changes

...

Reason

...

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

This functionality cannot be added in rust-vmm.

codecov · 2025-08-14T09:31:56Z

Codecov Report

❌ Patch coverage is 48.02632% with 474 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.44%. Comparing base (d974044) to head (7bd20d2).

Files with missing lines	Patch %	Lines
src/vmm/src/builder.rs	45.94%	100 Missing ⚠️
src/vmm/src/lib.rs	2.02%	97 Missing ⚠️
src/vmm/src/resources.rs	37.23%	59 Missing ⚠️
src/vmm/src/vstate/vcpu.rs	18.96%	47 Missing ⚠️
src/vmm/src/vstate/vm.rs	69.91%	37 Missing ⚠️
src/vmm/src/devices/virtio/net/device.rs	39.62%	32 Missing ⚠️
src/vmm/src/vstate/memory.rs	75.78%	31 Missing ⚠️
src/vmm/src/persist.rs	37.77%	28 Missing ⚠️
src/vmm/src/devices/virtio/block/device.rs	0.00%	10 Missing ⚠️
src/vmm/src/devices/virtio/block/virtio/device.rs	36.36%	7 Missing ⚠️
... and 9 more

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5379      +/-   ##
==========================================
- Coverage   82.31%   81.44%   -0.87%     
==========================================
  Files         265      265              
  Lines       30645    31354     +709     
==========================================
+ Hits        25224    25535     +311     
- Misses       5421     5819     +398

Flag	Coverage Δ
5.10-c5n.metal	`81.21% <45.15%> (-1.12%)`	⬇️
5.10-m5n.metal	`81.21% <45.15%> (-1.11%)`	⬇️
5.10-m6a.metal	`80.48% <45.15%> (-1.13%)`	⬇️
5.10-m6g.metal	`77.91% <45.09%> (-1.02%)`	⬇️
5.10-m6i.metal	`81.21% <45.15%> (-1.12%)`	⬇️
5.10-m7a.metal-48xl	`80.46% <45.15%> (?)`
5.10-m7g.metal	`77.91% <45.09%> (-1.02%)`	⬇️
5.10-m7i.metal-24xl	`81.22% <45.15%> (?)`
5.10-m7i.metal-48xl	`81.18% <45.15%> (?)`
5.10-m8g.metal-24xl	`77.91% <45.09%> (?)`
5.10-m8g.metal-48xl	`77.91% <45.09%> (?)`
6.1-c5n.metal	`81.26% <45.15%> (-1.11%)`	⬇️
6.1-m5n.metal	`81.29% <45.15%> (-1.08%)`	⬇️
6.1-m6a.metal	`80.52% <45.15%> (-1.14%)`	⬇️
6.1-m6g.metal	`77.91% <45.09%> (-1.02%)`	⬇️
6.1-m6i.metal	`81.25% <45.15%> (-1.11%)`	⬇️
6.1-m7a.metal-48xl	`80.51% <45.15%> (?)`
6.1-m7g.metal	`77.91% <45.09%> (-1.02%)`	⬇️
6.1-m7i.metal-24xl	`81.27% <45.15%> (?)`
6.1-m7i.metal-48xl	`81.27% <45.15%> (?)`
6.1-m8g.metal-24xl	`77.91% <45.09%> (?)`
6.1-m8g.metal-48xl	`77.91% <45.09%> (?)`
6.16-c5n.metal	`81.29% <46.25%> (?)`
6.16-m5n.metal	`81.29% <46.25%> (?)`
6.16-m6a.metal	`80.59% <46.25%> (?)`
6.16-m6g.metal	`77.95% <46.19%> (?)`
6.16-m6i.metal	`81.28% <46.25%> (?)`
6.16-m7a.metal-48xl	`80.54% <46.25%> (?)`
6.16-m7g.metal	`77.95% <46.19%> (?)`
6.16-m7i.metal-24xl	`81.30% <46.25%> (?)`
6.16-m7i.metal-48xl	`81.30% <46.25%> (?)`
6.16-m8g.metal-24xl	`77.95% <46.19%> (?)`
6.16-m8g.metal-48xl	`77.94% <46.19%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/vmm/src/vstate/memory.rs

CHANGELOG.md

kalyazin · 2025-08-14T11:20:42Z

src/firecracker/examples/uffd/fault_all_handler.rs

-                .read_event()
-                .expect("Failed to read uffd_msg")
-                .expect("uffd_msg not ready");
+            let Some(event) = uffd_handler.read_event().expect("Failed to read uffd_msg") else {


do we still need this?

this one I have no idea

Creating a script to build and install a modified kernel with patches applied. Signed-off-by: Jack Thomson <[email protected]>

Adding a new integration test to assert that the kernel build script will succeed. Signed-off-by: Jack Thomson <[email protected]>

Adding the secret hiding kernel as a default for the buildkite pipeline, this will mean that PR's made against the branch will now be run with the new secret hiding enabled amis. Some tests have been marked to skip as they are kernel dependent so while we are compiling our kernel in CI these could change again. Signed-off-by: Jack Thomson <[email protected]>

To make it easier to track the upstream kernels which may change as we rebase, let's mark kernels newer than 6.12 as next for now to make dashboarding easier. Signed-off-by: Jack Thomson <[email protected]>

Addressing a comment to move away from dir stacks in our install scripts. We now store the start directly before we move the build directory and cd back to that explicitly. Signed-off-by: Jack Thomson <[email protected]>

Run the kernel build as part of our nightly tests so we can monitor it's success. Signed-off-by: Jack Thomson <[email protected]>

Add all required linux host kernel patches required for secret hiding. These are: - v17 of guest_memfd direct map support [1] - Direct map removal patches - make kvm_clock work with direct map removed guest_memfd - v2 of KVM_USERFAULT patches [2] - support for UFFDIO_CONTINUE in guest_memfd VMAs - support for write(2) syscall for guest_memfd [1]: https://lore.kernel.org/kvm/[email protected]/T/#m729a1f14fbbccdd66ed5fe434ff3a9d46b055dec [2]: https://lore.kernel.org/kvm/[email protected]/ emoval series to apply on top of v17. Rebase all the other series. Signed-off-by: Patrick Roy <[email protected]>

The patches are in the `patches` subdirectory of `hiding_ci`, so if only patches were added, then the check of "any files with parent directory `hiding_ci`" would be false, and the CI step for testing the build of patches wouldn't actually run. Fix this by updating the check to be "any files where any parent directory is `hiding_ci`", which will also catch patches. Reported-by: Jack Thomson <[email protected]> Signed-off-by: Patrick Roy <[email protected]>

Update the build script to allow us to install the secret hidden kernels onto Amazon Linux 2023 instances. We have to as part of this include a script to download and install ena drivers for the instance to allow us to boot. Signed-off-by: Jack Thomson <[email protected]>

The output from the build in x86 is archived so updated the script to support installing either output type from the build Signed-off-by: Jack Thomson <[email protected]>

Add an 'apt update' before `apt install`. Otherwise, we might hold an old view of the package versions and installation might fail. Signed-off-by: Babis Chalios <[email protected]>

This lint forbids using `..Default::default()` in struct initializers after all fields have already been initialized, but this is a useful pattern if you know you want to add more fields to a struct in a future PR without needing to touch a ton of initializers in unittests again (_heavy foreshadowing_). So silence the paperclip. Signed-off-by: Patrick Roy <[email protected]>

There's no need to test this through VmResources when it can be tested in isolation. Also, everytime I touch MachineConfig I get confsued by where the hell the tests are, cuz not only are they in a different module, they're also one directory level away. So move the tests into machine_config.rs, where it makes sense to have them. Signed-off-by: Patrick Roy <[email protected]>

With secret freedom, direct accesses to guest memory from the context of the host kernel are no longer possible. This particularly means that we cannot pass pointers to guest memory to the host kernel anymore (at least if the host kernel tries to GUP them). For these scenarios, introduce a utility decorator struct `MaybeBounce` that can optionally do indirect read and write syscalls on guest memory by first memcpy-ing to firecracker userspace, and passing a pointer to firecracker heap memory into the kernel instead. Signed-off-by: Patrick Roy <[email protected]>

This is particularly useful for virtio devices, where on-demand allocation of bounce buffers leads to sever performance impacts (~80%) in synthetic throughput tests. Additionally, for virtio devices we can know approximately what the optimal size of a statically allocated bounce buffer is. Allocate bounce buffers on the heap, as trying to even temporarily place a 65k bounce buffer on the stack can lead to stack overflow errors. Signed-off-by: Patrick Roy <[email protected]>

Add support to our virtio devices to allow userspace bounce buffering of virtio buffers. This is an alternative to swiotlb. Don't implement it for vhost-user-blk and for virtio-block with async engine, because I have no idea how that would even work. Signed-off-by: Patrick Roy <[email protected]>

If the CI artifacts dont contain old firecracker releases, still succeed at setting them up after downloading them. Signed-off-by: Patrick Roy <[email protected]>

Add a test that we can boot VMs and initrds with secret freedom enabled. Signed-off-by: Patrick Roy <[email protected]>

Since we load the kernel using bounce buffers now, it will give us false-positives. Signed-off-by: Patrick Roy <[email protected]>

Move from Ubuntu to AL2023 for the secret hiding testing to bring it inline with the other kernels We had to add some more kernel config overrides. The amazon linux default kernel didn't have CRYPTO_HW enabled, this is required as a dependency for AMD_SEV. Signed-off-by: Jack Thomson <[email protected]>

The install script on amazon linux isn't storing the .config in our boot directory by default. This is causing our spectre checker script which relies on the config. Updated our script to move this if it has't been done so already. Signed-off-by: Jack Thomson <[email protected]>

We are not using the .lore/.mbox options, and I dont see us doing so again in the future either. Signed-off-by: Patrick Roy <[email protected]>

This is needed because if guest_memfd is used to back guest memory, vCPU fault notifications are delivered via the UFFD UDS socket. Signed-off-by: Nikita Kalyazin <[email protected]>

It is used by Secret-Free-enabled UFFD handlers to disable vCPU fault notifications from the kernel. Signed-off-by: Nikita Kalyazin <[email protected]>

Accept receiving 3 fds instead of 1, where fds[1] is guest_memfd and fds[2] is userfault bitmap memfd. Also handle the FaultRequest message over the UDS socket by calling a new callback in the Runtime and sending a FaultReply. Co-authored-by: Patrick Roy <[email protected]> Signed-off-by: Patrick Roy <[email protected]> Signed-off-by: Nikita Kalyazin <[email protected]>

There are two ways a UFFD handler receives a fault notification if Secret Fredom is enabled (which is inferred from 3 fds sent by Firecracker instead of 1): - a VMM- or KVM-triggered fault is delivered via a minor UFFD fault event. The handler is supposed to respond to it via memcpying the content of the page (if the page hasn't already been populated) followed by a UFFDIO_CONTINUE call. - a vCPU-triggered fault is delievered via a FaultRequest message on the UDS socket. The handler is supposed to reply with a pwrite64 call on the guest_memfd to populate the page followed by a FaultReply message on the UDS socket. In both cases, the handler also needs to clear the bit in the userfault bitmap at the corresponding offset in order to stop further fault notifications for the same page. UFFD handlers use the userfault bitmap for two purposes: - communicate to the kernel whether a fault at the corresponding guest_memfd offset will cause a VM exit - keep track of pages that have already been populated in order to avoid overwriting the content of the page that is already initialised. Signed-off-by: Nikita Kalyazin <[email protected]>

These are used for communication of page faults between Firecracker and a UFFD handler. Signed-off-by: Nikita Kalyazin <[email protected]>

If configured, userfault bitmap is registered with KVM and controls whether KVM will exit to userspace on a fault of the corresponding page. We are going to allocate the bitmap in a memfd in Firecracker, set bits for all pages to request notifications for vCPU faults and send it to the UFFD handler to delegate clearing the bits as pages get populated. Since the KVM userfault patches are still in review, set_user_memory_region2 is not aware of the userfault flag and the userfault bitmap address in its input structure. Define it in Firecracker code temporarily. Signed-off-by: Nikita Kalyazin <[email protected]>

This is needed to instruct the kernel to exit to userspace when a vCPU fault occurs and the corresponding bit in the userfault bitmap is set. The userfault bitmap is allocated in a memfd by Firecracker and sent to the UFFD handler. This also sends 3 fds to the UFFD handler in the handshake: - UFFD (original) - guest_memfd: for the handler to be able to populate guest memory - userfault bitmap memfd: for the handler to be able to disable exits to userspace for the pages that have already been populated Signed-off-by: Nikita Kalyazin <[email protected]>

This is because vCPUs reason in GPAs while the secret-free UFFD protocol is guest_memfd-offset-based. Note that offset_to_gpa is not used yet, but will likely be needed to support async PF to pass the GPA to a new ioctl when notifying KVM of a fault resolution. Signed-off-by: Nikita Kalyazin <[email protected]>

It contains two parts: - external: between the VMM thread and the UFFD handler - internal: between vCPUs and the VMM thread An outline of the workflow: - When a vCPU fault occurs, vCPU exits to userspace - The vCPU thread sends sends the exit syndrome in the vCPU to VMM channel and writes to the eventfd - The VMM thread forwards the syndrome to the UFFD handler via the UDS socket - The UFFD handler populates the page, clears the corresponding bit in the userfault bitmap and sends a reply to Firecracker - The VMM thread receives the reply and updates a vCPU condvar to notify the vCPU that the fault has been resolved - The vCPU resumes execution Note that as a result of this change, an ability to exit the VM gracefully is lost (at least on x86). In the existing implementation, the VMM thread initiated an exit if an event was read from the eventfd, but no VcpuResponse::Exited responses were read for unknown reason. Since the exit_evt eventfd is now also used by vCPUs to notify the VMM thread of the VM exits caused by pagefaults, this situation (an eventfd event, but response in the channel) can occur also because we have read all VcpuResponse::Userfault in response to the previous eventfd event. Signed-off-by: Nikita Kalyazin <[email protected]>

In a regular VM, we mmap the memory snapshot file and supply the address in the KVM memory slot. In Secret Free VMs, we provide guest_memfd in the memory slot instead. There is no way we can restore a Secret Free VM from a file, unless we prepopulate the guest_memfd with the file content, which is inefficient and is not practically useful. Signed-off-by: Nikita Kalyazin <[email protected]>

This includes both functional and performance tests. Signed-off-by: Nikita Kalyazin <[email protected]>

Do not add a balloon device to a Secret Free VM as it is not currently supported. Signed-off-by: Nikita Kalyazin <[email protected]>

This is because the error type has changed due the implementation of snapshot restore support for Secret Free VMs. Signed-off-by: Nikita Kalyazin <[email protected]>

Graceful shutdown is currently broken on x86_64. Signed-off-by: Nikita Kalyazin <[email protected]>

Writing to the noturbo sysfs immediately locks up the entire instance, so stop doing this for now. Signed-off-by: Patrick Roy <[email protected]>

Without this, the script will ask for user input and get stuck if run unattended. Signed-off-by: Patrick Roy <[email protected]>

Started seeing the below failure in test_population_latency: thread 'main' panicked at .../uffd/fault_all_handler.rs:41:18: uffd_msg not ready note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace I am not entierly sure how this can happen, because the read from the uffd is supposed to be blocking, but maybe it's a weird interaction with the fault-all behavior (e.g. there was a uffd event queues, but because we faulted everything it got cancelled again?), so let's just try going back to read(2) if we dont read anything. Signed-off-by: Patrick Roy <[email protected]>

Currently, we often get stuck with the problem where something in the host kernel breaks that causes functional tests to fail, but we cannot update the patch series from which the host kernel gets built, because functional tests are failing. Break this cyclic dependency by simply not running functional tests when updating only the patch series (as they dont test the updated kernel anyway. Signed-off-by: Patrick Roy <[email protected]>

Return errors up the stack instead of panicking. Signed-off-by: Nikita Kalyazin <[email protected]>

Return None if file_offset() is None instead. Signed-off-by: Nikita Kalyazin <[email protected]>

This is to make sure that we always write the entire FaultRequest message even if the syscall was interrupted. Signed-off-by: Nikita Kalyazin <[email protected]>

Make sure we continue reading the FaultReply if the syscall was interrupted. Signed-off-by: Nikita Kalyazin <[email protected]>

Get rid of the expect by using indexing. Signed-off-by: Nikita Kalyazin <[email protected]>

roypat force-pushed the sh_rebased branch 6 times, most recently from a8a29ab to 27d21c8 Compare August 14, 2025 09:26

roypat force-pushed the sh_rebased branch 2 times, most recently from b023ede to 5f73808 Compare August 14, 2025 10:51

kalyazin reviewed Aug 14, 2025

View reviewed changes

roypat force-pushed the sh_rebased branch 3 times, most recently from e70f9c5 to b0acfa6 Compare August 14, 2025 16:25

JackThomson2 and others added 17 commits August 14, 2025 17:32

ci: Create script for installing custom kernel

cfc3c33

Creating a script to build and install a modified kernel with patches applied. Signed-off-by: Jack Thomson <[email protected]>

test: Add test for kernel build

b23626e

Adding a new integration test to assert that the kernel build script will succeed. Signed-off-by: Jack Thomson <[email protected]>

tests: Mark kernels newer than 6.12 as next

8bf8133

To make it easier to track the upstream kernels which may change as we rebase, let's mark kernels newer than 6.12 as next for now to make dashboarding easier. Signed-off-by: Jack Thomson <[email protected]>

ci: Move away from using dir stacks

8664797

Addressing a comment to move away from dir stacks in our install scripts. We now store the start directly before we move the build directory and cd back to that explicitly. Signed-off-by: Jack Thomson <[email protected]>

tests(bk): Run the kernel build in our nightly PR

ab60b20

Run the kernel build as part of our nightly tests so we can monitor it's success. Signed-off-by: Jack Thomson <[email protected]>

ci: Update the script to support x86 on AL23

dd40575

The output from the build in x86 is archived so updated the script to support installing either output type from the build Signed-off-by: Jack Thomson <[email protected]>

fix: test_hiding_kernel.py

dc52e58

Add an 'apt update' before `apt install`. Otherwise, we might hold an old view of the package versions and installation might fail. Signed-off-by: Babis Chalios <[email protected]>

ci: dont fail downloading artifacts if no firecracker binaries exist

bda5e0b

If the CI artifacts dont contain old firecracker releases, still succeed at setting them up after downloading them. Signed-off-by: Patrick Roy <[email protected]>

roypat and others added 28 commits August 14, 2025 17:32

test: add functional tests for booting secret free VMs

047e588

Add a test that we can boot VMs and initrds with secret freedom enabled. Signed-off-by: Patrick Roy <[email protected]>

test: disable memory monitor in boottime tests

70bd4c6

Since we load the kernel using bounce buffers now, it will give us false-positives. Signed-off-by: Patrick Roy <[email protected]>

hiding_ci: remove support for everything but .patch files

acd6076

We are not using the .lore/.mbox options, and I dont see us doing so again in the future either. Signed-off-by: Patrick Roy <[email protected]>

test(uffd_utils): add protocol definitions for secret freedom

3ec1f81

This is needed because if guest_memfd is used to back guest memory, vCPU fault notifications are delivered via the UFFD UDS socket. Signed-off-by: Nikita Kalyazin <[email protected]>

test(uffd_utils): add UserfaultBitmap

d6ad5a0

It is used by Secret-Free-enabled UFFD handlers to disable vCPU fault notifications from the kernel. Signed-off-by: Nikita Kalyazin <[email protected]>

feat(vmm): add secret free userfault definitions

906a186

These are used for communication of page faults between Firecracker and a UFFD handler. Signed-off-by: Nikita Kalyazin <[email protected]>

test: enable secret freedom in uffd tests

e54ae5f

This includes both functional and performance tests. Signed-off-by: Nikita Kalyazin <[email protected]>

test(uffd/valid_handler): do not use balloon if secret free

96391b8

Do not add a balloon device to a Secret Free VM as it is not currently supported. Signed-off-by: Nikita Kalyazin <[email protected]>

test: update expected error strings

e06f5f5

This is because the error type has changed due the implementation of snapshot restore support for Secret Free VMs. Signed-off-by: Nikita Kalyazin <[email protected]>

tmp(test/api): disable x86 tests that use reboot

3de46d0

Graceful shutdown is currently broken on x86_64. Signed-off-by: Nikita Kalyazin <[email protected]>

tmp: Stop tweaking turbo/pstates in perf tests

76e5550

Writing to the noturbo sysfs immediately locks up the entire instance, so stop doing this for now. Signed-off-by: Patrick Roy <[email protected]>

fix: pass -y to yum in build_and_install_kernel.sh

e8eac67

Without this, the script will ask for user input and get stuck if run unattended. Signed-off-by: Patrick Roy <[email protected]>

fix(vmm): propagate errors in secret freedom

68f3557

Return errors up the stack instead of panicking. Signed-off-by: Nikita Kalyazin <[email protected]>

fix(vmm): do not unwrap in gpa_to_offset

5ef8e53

Return None if file_offset() is None instead. Signed-off-by: Nikita Kalyazin <[email protected]>

fix(vmm): write_all in process_vcpu_userfault

d091b0f

This is to make sure that we always write the entire FaultRequest message even if the syscall was interrupted. Signed-off-by: Nikita Kalyazin <[email protected]>

fix(vmm): handle EINTR in process_uffd_socket

10cf583

Make sure we continue reading the FaultReply if the syscall was interrupted. Signed-off-by: Nikita Kalyazin <[email protected]>

fix(vmm): simplify vcpus_handles dereferencing in process_uffd_socket

7bd20d2

Get rid of the expect by using indexing. Signed-off-by: Nikita Kalyazin <[email protected]>

roypat force-pushed the sh_rebased branch from b0acfa6 to 7bd20d2 Compare August 14, 2025 16:32

roypat closed this Aug 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] rebase of feature/secret-hiding onto main #5379

[WIP] rebase of feature/secret-hiding onto main #5379

Uh oh!

roypat commented Aug 13, 2025

Uh oh!

codecov bot commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

kalyazin Aug 14, 2025

Uh oh!

roypat Aug 14, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[WIP] rebase of feature/secret-hiding onto main #5379

[WIP] rebase of feature/secret-hiding onto main #5379

Uh oh!

Conversation

roypat commented Aug 13, 2025

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

kalyazin Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

roypat Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Aug 14, 2025 •

edited

Loading