-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Fix intermittent failures of test_balloon.py::test_size_reduction #4782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #4782 +/- ##
=======================================
Coverage 84.34% 84.34%
=======================================
Files 249 249
Lines 27460 27460
=======================================
Hits 23160 23160
Misses 4300 4300
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Tests that boot a microvm via a config file do not use `.start()`, as `.spawn()` already start the bootup process. This is probably why these tests were overlooked when we did the `.wait_for_up()` refactor. Fix potential issues by adding the `wait_for_up()` calls here, to replace various homegrown waiting solutions. Some tests do not have network devices, and also don't really need to wait for the guest to have booted as they only validate firecracker API responses. Don't try to use wait_for_up in those (since we can't anyway). Signed-off-by: Patrick Roy <[email protected]>
The assigned value was immediately overwritten without ever being read. Signed-off-by: Patrick Roy <[email protected]>
e446005
to
540c78b
Compare
6d0faf3
to
4a97d6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments
Remove assignment to a variable that was never read Signed-off-by: Patrick Roy <[email protected]>
"Some times the ..." -> "Sometimes the ..." (no space) Signed-off-by: Patrick Roy <[email protected]>
This test serves no purpose, as all it does is take a snapshot without verifying anything about it. In this sense, it is perfectly subsumed by `test_balloon_snapshot`, which also creates snapshots with balloon devices, but actually verifies they restore as well (something that is not given in this test, because we do not wait for the guest to boot before taking a snapshot). A look at the commit history shows that this test used to be responsible for verifying cross-release snapshot support for the balloon device, but since we no longer support such a thing, it became superfluous. Signed-off-by: Patrick Roy <[email protected]>
Some of the tests in test_balloon.py were missing `microvm.wait_for_up()` calls. This was mainly visible from those that read the RSS of the firecracker process taking a long time to calibrate, due to RSS being very unstable during guest boot up. Signed-off-by: Patrick Roy <[email protected]>
Major page fault occur when a page fault can only be satisfied via disk I/O [1]. In `test_stats`, we asserted that the number of major page faults increased during execution of `make_guest_dirty_memory`, but this cannot happen, as it uses MAP_ANONYMOUS, and thus will not trigger any major page faults. The reason the test used to work in the past is that it was not waiting for the VM to boot, so the first reading of the balloon statistics happened before the boot process was finished. However, during boot we _do_ expect major faults to happen, as things need to be read from the rootfs. Fix the test to instead just assert a non-zero number of major faults happened during boot. [1]: https://en.wikipedia.org/wiki/Page_fault#Major Signed-off-by: Patrick Roy <[email protected]>
75806ca
to
4c35583
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes
Adds a bunch of missing
wait_for_up
s, and clears up some excessive loggingReason
Missing
wait_for_up
in various balloon tests caused RSS measurements to be taken during the boot process, which later results in intermittent failures. For example, intest_size_reduction
, we try to inflate the balloon based on an RSS reading taken during boot, which leaves too little memory available for a booted guest.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
PR.
CHANGELOG.md
.TODO
s link to an issue.contribution quality standards.
rust-vmm
.