Fix intermittent failures of test_balloon.py::test_size_reduction #4782

roypat · 2024-09-03T15:54:06Z

Changes

Adds a bunch of missing wait_for_ups, and clears up some excessive logging

Reason

Missing wait_for_up in various balloon tests caused RSS measurements to be taken during the boot process, which later results in intermittent failures. For example, in test_size_reduction, we try to inflate the balloon based on an RSS reading taken during boot, which leaves too little memory available for a booted guest.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.

PR Checklist

If a specific issue led to this PR, this PR closes the issue.
The description of changes is clear and encompassing.
Any required documentation changes (code and docs) are included in this
PR.
API changes follow the Runbook for Firecracker API changes.
User-facing changes are mentioned in CHANGELOG.md.
All added/changed functionality is tested.
New TODOs link to an issue.
Commits meet
contribution quality standards.

This functionality cannot be added in rust-vmm.

codecov · 2024-09-03T16:03:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.34%. Comparing base (4f0909a) to head (7ee1397).
Report is 7 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4782   +/-   ##
=======================================
  Coverage   84.34%   84.34%           
=======================================
  Files         249      249           
  Lines       27460    27460           
=======================================
  Hits        23160    23160           
  Misses       4300     4300

Flag	Coverage Δ
5.10-c5n.metal	`84.56% <ø> (ø)`
5.10-m5n.metal	`84.54% <ø> (ø)`
5.10-m6a.metal	`83.83% <ø> (ø)`
5.10-m6g.metal	`80.91% <ø> (ø)`
5.10-m6i.metal	`84.54% <ø> (ø)`
5.10-m7g.metal	`80.91% <ø> (ø)`
6.1-c5n.metal	`84.55% <ø> (ø)`
6.1-m5n.metal	`84.54% <ø> (ø)`
6.1-m6a.metal	`83.83% <ø> (ø)`
6.1-m6g.metal	`80.91% <ø> (ø)`
6.1-m6i.metal	`84.54% <ø> (ø)`
6.1-m7g.metal	`80.91% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Tests that boot a microvm via a config file do not use `.start()`, as `.spawn()` already start the bootup process. This is probably why these tests were overlooked when we did the `.wait_for_up()` refactor. Fix potential issues by adding the `wait_for_up()` calls here, to replace various homegrown waiting solutions. Some tests do not have network devices, and also don't really need to wait for the guest to have booted as they only validate firecracker API responses. Don't try to use wait_for_up in those (since we can't anyway). Signed-off-by: Patrick Roy <[email protected]>

The assigned value was immediately overwritten without ever being read. Signed-off-by: Patrick Roy <[email protected]>

tests/framework/microvm.py

tests/framework/utils.py

bchalios

Some minor comments

tests/integration_tests/functional/test_cmd_line_start.py

tests/integration_tests/functional/test_balloon.py

Remove assignment to a variable that was never read Signed-off-by: Patrick Roy <[email protected]>

"Some times the ..." -> "Sometimes the ..." (no space) Signed-off-by: Patrick Roy <[email protected]>

This test serves no purpose, as all it does is take a snapshot without verifying anything about it. In this sense, it is perfectly subsumed by `test_balloon_snapshot`, which also creates snapshots with balloon devices, but actually verifies they restore as well (something that is not given in this test, because we do not wait for the guest to boot before taking a snapshot). A look at the commit history shows that this test used to be responsible for verifying cross-release snapshot support for the balloon device, but since we no longer support such a thing, it became superfluous. Signed-off-by: Patrick Roy <[email protected]>

Some of the tests in test_balloon.py were missing `microvm.wait_for_up()` calls. This was mainly visible from those that read the RSS of the firecracker process taking a long time to calibrate, due to RSS being very unstable during guest boot up. Signed-off-by: Patrick Roy <[email protected]>

Major page fault occur when a page fault can only be satisfied via disk I/O [1]. In `test_stats`, we asserted that the number of major page faults increased during execution of `make_guest_dirty_memory`, but this cannot happen, as it uses MAP_ANONYMOUS, and thus will not trigger any major page faults. The reason the test used to work in the past is that it was not waiting for the VM to boot, so the first reading of the balloon statistics happened before the boot process was finished. However, during boot we _do_ expect major faults to happen, as things need to be read from the rootfs. Fix the test to instead just assert a non-zero number of major faults happened during boot. [1]: https://en.wikipedia.org/wiki/Page_fault#Major Signed-off-by: Patrick Roy <[email protected]>

bchalios

roypat added 2 commits September 3, 2024 17:13

test: Remove useless variable assignment

6b3ead7

The assigned value was immediately overwritten without ever being read. Signed-off-by: Patrick Roy <[email protected]>

roypat force-pushed the misc-test-fixes branch from e446005 to 540c78b Compare September 3, 2024 16:19

roypat changed the title ~~First intermittent failures of test_balloon.py::test_size_reduction~~ Fix intermittent failures of test_balloon.py::test_size_reduction Sep 3, 2024

pb8o reviewed Sep 4, 2024

View reviewed changes

tests/framework/microvm.py Outdated Show resolved Hide resolved

tests/framework/utils.py Outdated Show resolved Hide resolved

tests/framework/utils.py Outdated Show resolved Hide resolved

roypat force-pushed the misc-test-fixes branch from 6d0faf3 to 4a97d6a Compare September 4, 2024 08:31

roypat added the Status: Awaiting review Indicates that a pull request is ready to be reviewed label Sep 4, 2024

bchalios previously approved these changes Sep 4, 2024

View reviewed changes

tests/integration_tests/functional/test_cmd_line_start.py Show resolved Hide resolved

tests/integration_tests/functional/test_balloon.py Show resolved Hide resolved

roypat added 5 commits September 4, 2024 12:30

test: remove useless statement

dfdd510

Remove assignment to a variable that was never read Signed-off-by: Patrick Roy <[email protected]>

test: fix typo in comment

2b2f9f3

"Some times the ..." -> "Sometimes the ..." (no space) Signed-off-by: Patrick Roy <[email protected]>

roypat dismissed bchalios’s stale review via 4c35583 September 4, 2024 11:31

roypat force-pushed the misc-test-fixes branch from 75806ca to 4c35583 Compare September 4, 2024 11:31

Merge branch 'main' into misc-test-fixes

7ee1397

roypat requested review from bchalios and pb8o September 4, 2024 11:33

pb8o approved these changes Sep 4, 2024

View reviewed changes

bchalios approved these changes Sep 4, 2024

View reviewed changes

bchalios merged commit 18a2a05 into firecracker-microvm:main Sep 4, 2024
7 checks passed

kalyazin mentioned this pull request Oct 7, 2024

[1.9] add vsock fix and test for tap offload features #4836

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix intermittent failures of test_balloon.py::test_size_reduction #4782

Fix intermittent failures of test_balloon.py::test_size_reduction #4782

Uh oh!

roypat commented Sep 3, 2024

Uh oh!

codecov bot commented Sep 3, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bchalios left a comment

Uh oh!

Uh oh!

Uh oh!

bchalios left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix intermittent failures of test_balloon.py::test_size_reduction #4782

Fix intermittent failures of test_balloon.py::test_size_reduction #4782

Uh oh!

Conversation

roypat commented Sep 3, 2024

Changes

Reason

License Acceptance

PR Checklist

Uh oh!

codecov bot commented Sep 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bchalios left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bchalios left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Sep 3, 2024 •

edited

Loading