Skip to content

Conversation

roypat
Copy link
Contributor

@roypat roypat commented Sep 3, 2024

Changes

Adds a bunch of missing wait_for_ups, and clears up some excessive logging

Reason

Missing wait_for_up in various balloon tests caused RSS measurements to be taken during the boot process, which later results in intermittent failures. For example, in test_size_reduction, we try to inflate the balloon based on an RSS reading taken during boot, which leaves too little memory available for a booted guest.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

  • If a specific issue led to this PR, this PR closes the issue.
  • The description of changes is clear and encompassing.
  • Any required documentation changes (code and docs) are included in this
    PR.
  • API changes follow the Runbook for Firecracker API changes.
  • User-facing changes are mentioned in CHANGELOG.md.
  • All added/changed functionality is tested.
  • New TODOs link to an issue.
  • Commits meet
    contribution quality standards.

  • This functionality cannot be added in rust-vmm.

Copy link

codecov bot commented Sep 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (4f0909a) to head (7ee1397).
Report is 7 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4782   +/-   ##
=======================================
  Coverage   84.34%   84.34%           
=======================================
  Files         249      249           
  Lines       27460    27460           
=======================================
  Hits        23160    23160           
  Misses       4300     4300           
Flag Coverage Δ
5.10-c5n.metal 84.56% <ø> (ø)
5.10-m5n.metal 84.54% <ø> (ø)
5.10-m6a.metal 83.83% <ø> (ø)
5.10-m6g.metal 80.91% <ø> (ø)
5.10-m6i.metal 84.54% <ø> (ø)
5.10-m7g.metal 80.91% <ø> (ø)
6.1-c5n.metal 84.55% <ø> (ø)
6.1-m5n.metal 84.54% <ø> (ø)
6.1-m6a.metal 83.83% <ø> (ø)
6.1-m6g.metal 80.91% <ø> (ø)
6.1-m6i.metal 84.54% <ø> (ø)
6.1-m7g.metal 80.91% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Tests that boot a microvm via a config file do not use `.start()`, as
`.spawn()` already start the bootup process. This is probably why these
tests were overlooked when we did the `.wait_for_up()` refactor.

Fix potential issues by adding the `wait_for_up()` calls here, to
replace various homegrown waiting solutions.

Some tests do not have network devices, and also don't really need to
wait for the guest to have booted as they only validate firecracker API
responses. Don't try to use wait_for_up in those (since we can't
anyway).

Signed-off-by: Patrick Roy <[email protected]>
The assigned value was immediately overwritten without ever being read.

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat changed the title First intermittent failures of test_balloon.py::test_size_reduction Fix intermittent failures of test_balloon.py::test_size_reduction Sep 3, 2024
@roypat roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Sep 4, 2024
bchalios
bchalios previously approved these changes Sep 4, 2024
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments

Remove assignment to a variable that was never read

Signed-off-by: Patrick Roy <[email protected]>
"Some times the ..." -> "Sometimes the ..." (no space)

Signed-off-by: Patrick Roy <[email protected]>
This test serves no purpose, as all it does is take a snapshot without
verifying anything about it. In this sense, it is perfectly subsumed by
`test_balloon_snapshot`, which also creates snapshots with balloon
devices, but actually verifies they restore as well (something that is
not given in this test, because we do not wait for the guest to boot
before taking a snapshot).

A look at the commit history shows that this test used to be responsible
for verifying cross-release snapshot support for the balloon device, but
since we no longer support such a thing, it became superfluous.

Signed-off-by: Patrick Roy <[email protected]>
Some of the tests in test_balloon.py were missing
`microvm.wait_for_up()` calls. This was mainly visible from those that
read the RSS of the firecracker process taking a long time to calibrate,
due to RSS being very unstable during guest boot up.

Signed-off-by: Patrick Roy <[email protected]>
Major page fault occur when a page fault can only be satisfied via disk
I/O [1]. In `test_stats`, we asserted that the number of major page
faults increased during execution of `make_guest_dirty_memory`, but this
cannot happen, as it uses MAP_ANONYMOUS, and thus will not trigger any
major page faults. The reason the test used to work in the past is that
it was not waiting for the VM to boot, so the first reading of the
balloon statistics happened before the boot process was finished.
However, during boot we _do_ expect major faults to happen, as things
need to be read from the rootfs.

Fix the test to instead just assert a non-zero number of major faults
happened during boot.

[1]: https://en.wikipedia.org/wiki/Page_fault#Major

Signed-off-by: Patrick Roy <[email protected]>
@roypat roypat requested review from bchalios and pb8o September 4, 2024 11:33
Copy link
Contributor

@bchalios bchalios left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@bchalios bchalios merged commit 18a2a05 into firecracker-microvm:main Sep 4, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Awaiting review Indicates that a pull request is ready to be reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants