-
Notifications
You must be signed in to change notification settings - Fork 2.1k
[1.9] add vsock fix and test for tap offload features #4836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
bchalios
merged 17 commits into
firecracker-microvm:firecracker-v1.9
from
kalyazin:tap_offload_test_1.9
Oct 8, 2024
Merged
[1.9] add vsock fix and test for tap offload features #4836
bchalios
merged 17 commits into
firecracker-microvm:firecracker-v1.9
from
kalyazin:tap_offload_test_1.9
Oct 8, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ShadowCurse
previously approved these changes
Oct 7, 2024
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## firecracker-v1.9 #4836 +/- ##
====================================================
- Coverage 84.35% 84.34% -0.01%
====================================================
Files 249 249
Lines 27447 27449 +2
====================================================
Hits 23152 23152
- Misses 4295 4297 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
roypat
previously approved these changes
Oct 7, 2024
Tests that boot a microvm via a config file do not use `.start()`, as `.spawn()` already start the bootup process. This is probably why these tests were overlooked when we did the `.wait_for_up()` refactor. Fix potential issues by adding the `wait_for_up()` calls here, to replace various homegrown waiting solutions. Some tests do not have network devices, and also don't really need to wait for the guest to have booted as they only validate firecracker API responses. Don't try to use wait_for_up in those (since we can't anyway). Signed-off-by: Patrick Roy <[email protected]>
The assigned value was immediately overwritten without ever being read. Signed-off-by: Patrick Roy <[email protected]>
Remove assignment to a variable that was never read Signed-off-by: Patrick Roy <[email protected]>
"Some times the ..." -> "Sometimes the ..." (no space) Signed-off-by: Patrick Roy <[email protected]>
This test serves no purpose, as all it does is take a snapshot without verifying anything about it. In this sense, it is perfectly subsumed by `test_balloon_snapshot`, which also creates snapshots with balloon devices, but actually verifies they restore as well (something that is not given in this test, because we do not wait for the guest to boot before taking a snapshot). A look at the commit history shows that this test used to be responsible for verifying cross-release snapshot support for the balloon device, but since we no longer support such a thing, it became superfluous. Signed-off-by: Patrick Roy <[email protected]>
Some of the tests in test_balloon.py were missing `microvm.wait_for_up()` calls. This was mainly visible from those that read the RSS of the firecracker process taking a long time to calibrate, due to RSS being very unstable during guest boot up. Signed-off-by: Patrick Roy <[email protected]>
Major page fault occur when a page fault can only be satisfied via disk I/O [1]. In `test_stats`, we asserted that the number of major page faults increased during execution of `make_guest_dirty_memory`, but this cannot happen, as it uses MAP_ANONYMOUS, and thus will not trigger any major page faults. The reason the test used to work in the past is that it was not waiting for the VM to boot, so the first reading of the balloon statistics happened before the boot process was finished. However, during boot we _do_ expect major faults to happen, as things need to be read from the rootfs. Fix the test to instead just assert a non-zero number of major faults happened during boot. [1]: https://en.wikipedia.org/wiki/Page_fault#Major Signed-off-by: Patrick Roy <[email protected]>
This is a regression test for the following fix: commit a9e5f13 Author: Nikita Kalyazin <[email protected]> Date: Mon Sep 30 13:05:28 2024 +0000 fix(net): set tap offload features on restore The test verifies that tap offload features are configured for both booted and restored VMs. Signed-off-by: Nikita Kalyazin <[email protected]>
Add entry for vsock fix into CHANGELOG. Signed-off-by: Egor Lazarchuk <[email protected]>
The test starts `socat` server on the host and `socat` client in the guest. The test then validates that `socat` client stops after snapshot is taken. Signed-off-by: Egor Lazarchuk <[email protected]>
We need to kick vsock queue during resume in order for a VM to process `TRANSPORT_RESET_EVENT` we sent during snapshot creation. Otherwise it will wait for it forever. Signed-off-by: Egor Lazarchuk <[email protected]>
Always block inside `Microvm.start()` until the vm has finished booting (if a network device is present in the microvm). This means that we no longer need `wait_for_up` calls after every `start()` call. Signed-off-by: Patrick Roy <[email protected]>
If we start Firecracker with a config file, then the microvm will boot in `.spawn()` instead of `.start()` (since we are not driving Firecracker via API requests). Thus, if this is the case, put the `wait_for_up` into `Microvm.spawn()`. Signed-off-by: Patrick Roy <[email protected]>
After restoring a snapshot, we are not waiting for the VM to finish booting (it needs to have done that before we took a snapshot), but having a `wait_for_up` here is still a good sanity check for ensuring the VM is still responsive. So far our tests have been inconsistent on whether to put a `wait_for_up` here. This makes them consistent by always doing `wait_for_up` if a network device is available. Signed-off-by: Patrick Roy <[email protected]>
`wait_for_up`s were replaced with ``` grep -rl wait_for_up tests/integration_tests/ \ |xargs sed '/wait_for_up/d' -i ``` Signed-off-by: Patrick Roy <[email protected]>
`vm.start()` already asserts `vm.state == "Running"`, so no need to do it again at the callsite. Signed-off-by: Patrick Roy <[email protected]>
dbd5660
to
ffaad42
Compare
This is a fix for a fix introduced in firecracker-microvm#4796 The issue was in vsock device hanging after snapshot restoration due to the guest not being notified about the termination packet. But there was bug in the fix, maily we saved the vsock state before the notification was sent, thus discarding all modifications made to sent the notification. The reason original fix worked, is because we were only testing with 1 iteration of snap/restore. This way even though we lost synchronization with the guest in the event queue state, it worked fine once. But doing more iterations causes vsock to hang as before. This commit fixes the issue by storing vsock state after the notification is sent and modifies the vsock test to run multiple iterations of snap/restore. Signed-off-by: Egor Lazarchuk <[email protected]>
roypat
approved these changes
Oct 8, 2024
ShadowCurse
approved these changes
Oct 8, 2024
pb8o
approved these changes
Oct 8, 2024
f6f21b4
into
firecracker-microvm:firecracker-v1.9
6 of 7 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes
Backport #4835 and its dependencies: #4782 , #4796 , #4798 , #4812 .
Reason
Regression testing in the release branch.
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
[ ] If a specific issue led to this PR, this PR closes the issue.PR.
[ ] API changes follow the Runbook for Firecracker API changes.[ ] User-facing changes are mentioned inCHANGELOG.md
.[ ] NewTODO
s link to an issue.contribution quality standards.
rust-vmm
.