Skip to content

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Mar 31, 2025

It turns out that some of our snapshot tests are not exactly exemplary at cleaning up snapshot memory files after they're done. This is partly because Microvm.restore_from_snapshot was largely oblivious to the fact that if a snapshot is uffd-restored, it doesn't need to be copied into the jail again for mmaping that will never happen, and also due to me messing up the build_n_from_snapshot function and not cleaning up snapshots in "incremental" mode.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • I have read and understand CONTRIBUTING.md.
  • I have run tools/devtool checkstyle to verify that the PR passes the
    automated style checks.
  • I have described what is done in these changes, why they are needed, and
    how they are solving the problem in a clear and encompassing way.
  • I have updated any relevant documentation (both in code and in the docs)
    in the PR.
  • I have mentioned all user-facing changes in CHANGELOG.md.
  • If a specific issue led to this PR, this PR closes the issue.
  • When making API changes, I have followed the
    Runbook for Firecracker API changes.
  • I have tested all new and changed functionalities in unit tests and/or
    integration tests.
  • I have linked an issue to every new TODO.

  • This functionality cannot be added in rust-vmm.

The restore_from_snapshot function did not integrate well with
uffd-based snapshot restore: Even if a UFFD path was specified, it still
created a copy of the snapshot memory file inside the chroot, even
though the UFFD handler set this up long ago in space_pf_handler.

Fix this, and while we're at it, also remove the need for passing in
uffd handler and snapshot file explicitly when using uffd-based restore,
as the spawn_pf_handler sets the uffd_handler field of the microvm
object, and can also easily be made to actually contain the snapshot
from which page faults are being served.

Signed-off-by: Patrick Roy <[email protected]>
Copy link

codecov bot commented Mar 31, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.09%. Comparing base (b38ec33) to head (b5652f5).
Report is 2 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5126   +/-   ##
=======================================
  Coverage   83.09%   83.09%           
=======================================
  Files         250      250           
  Lines       26920    26920           
=======================================
  Hits        22368    22368           
  Misses       4552     4552           
Flag Coverage Δ
5.10-c5n.metal 83.55% <ø> (+<0.01%) ⬆️
5.10-m5n.metal 83.55% <ø> (-0.01%) ⬇️
5.10-m6a.metal 82.72% <ø> (-0.01%) ⬇️
5.10-m6g.metal 79.43% <ø> (ø)
5.10-m6i.metal 83.54% <ø> (ø)
5.10-m7a.metal-48xl 82.71% <ø> (?)
5.10-m7g.metal 79.43% <ø> (ø)
6.1-c5n.metal 83.60% <ø> (ø)
6.1-m5n.metal 83.59% <ø> (ø)
6.1-m6a.metal 82.76% <ø> (ø)
6.1-m6g.metal 79.43% <ø> (+<0.01%) ⬆️
6.1-m6i.metal 83.59% <ø> (+<0.01%) ⬆️
6.1-m7a.metal-48xl 82.76% <ø> (?)
6.1-m7g.metal 79.43% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@roypat roypat force-pushed the snapshot-test-cleanup branch from 565afda to eca928f Compare April 1, 2025 08:15
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Apr 1, 2025
pb8o
pb8o previously approved these changes Apr 1, 2025
If build_n_from_snapshot is asked to build many snapshots incrementally,
it doesn't clean up after itself properly: Each iteration ends by
creating a new snapshot of the VM, and this snapshot is passed to
iteration n+1. Here, a copy is created inside the new VMs chroot, and
iteration n+1 ends by creating a new snapshot for iteration n+2, and
deleting the copy of the snapshot inside the chroot. However, we never
delete the snapshot created in iteration n, and so with each iteration
more snapshots accumulate.

Fix this by having the function delete the snapshot created in iteration
n after iteration n+2 finished successfully.  The idea here is that in
case of failure, we will have the snapshot created in iteration n+1 (the
one which caused a failure in n+2), and also the snapshot created in n
(which was the last known snapshot to successfully go through a test
iteration, namely iteration n+1).

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat force-pushed the snapshot-test-cleanup branch from 698edb2 to b5652f5 Compare April 1, 2025 09:39
@roypat roypat requested a review from pb8o April 1, 2025 10:13
@roypat
Copy link
Contributor Author

roypat commented Apr 1, 2025

force pushed to fix a typo and a left cover commented debug statement from testing

@zulinx86 zulinx86 merged commit 85622c6 into firecracker-microvm:main Apr 1, 2025
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants