-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Test rebase #5358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Test rebase #5358
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is just code organization changes. Create a new module under `virtio`, called `transport`. For the time being the only transport supported is `mmio`. Also, move `IrqInterrupt` type within the MMIO transport code, as it is MMIO specific. Signed-off-by: Babis Chalios <[email protected]>
`IrqTrigger::new()` returns a `Result` because creating an `EventFd` might fail with an `std::io::Error` error. All users of `IrqTrigger` create the object and directly unwrap the error. To avoid unwraps all over the place, change `IrqTrigger::new()` to unwrap a potential error while creating the EventFd internally and just return `Self`. Signed-off-by: Babis Chalios <[email protected]>
The MMIO transport for VirtIO devices uses an `IrqTrigger` object as the object that models the logic for sending interrupts from the device to the guest. We create one such object for every VirtIO device when creating it. The MMIO device manager associates this object with an IRQ number and registers it with KVM. This commit changes the timing of association of an `IrqTrigger` with a VirtIO-mmio device. It only assigns such an object to the device during its activation. We do this to prepare for supporting a PCI transport for VirtIO devices. The cloud hypervisor implementation for these passes the interrupt objects used by the device during activation, so we make this change to have a uniform way to handle interrupts for both transports. Functionally, nothing changes for MMIO devices, as before activation we don't trigger any interrupts. Signed-off-by: Babis Chalios <[email protected]>
Describing the APIs that need to implement types that are used as interrupts for VirtIO devices. Currently, we only use `IrqInterrupt` interrupts, but this will change once we have MSI-X with PCIe devices. Signed-off-by: Babis Chalios <[email protected]>
VirtIO devices assume they're operating under an MMIO transport and as a consequence they use IrqTrigger as interrupts. Switch that to using VirtioInterrupt for all VirtIO device objects. Only assume a VirtioInterrupt is an IrqTrigger in MMIO specific code. Signed-off-by: Babis Chalios <[email protected]>
Bring in the vm-device crate from CloudHypervisor. We will be using it for adding PCIe support. Signed-off-by: Babis Chalios <[email protected]>
We use `SerialDevice` with Stdin as the input source. Encode this in the type so that we don't spill the generic all over the place. Signed-off-by: Babis Chalios <[email protected]> Co-authored-by: Egor Lazarchuk <[email protected]> Signed-off-by: Egor Lazarchuk <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Use the vm_device::Bus bus for all MMIO devices. This is mainly to prepare for using it for PCIe devices. Also, sepate VirtIO devices from other MMIO devices inside the MMIODeviceManager struct. This makes iterating over VirtIO devices easier since we don't need to access two data structures to get a reference to a VirtIO device any more. Signed-off-by: Babis Chalios <[email protected]>
We were always constructing RTCDevice using a set of metrics that were defined in the RTC module itself. Don't leak the metrics to other modules. Instead, create a new() function that always constructs it the correct way. Signed-off-by: Babis Chalios <[email protected]>
Use the vm_device::Bus bus for PortIO devices on x86. PCIe devices will use this as well. Signed-off-by: Babis Chalios <[email protected]>
PCIe spec mandates that software can access the configuration space of PCIe devices both via MMIO and Port IO accesses. As a result, PCIe devices will need to register to both buses (on x86). Change the organization of devices, so that MMIO and PIO device managers do not own the buses. Instead, introduce a DeviceManager object which holds the buses, the resource allocator and includes also all types of device managers (at the moment MMIO, PIO and ACPI). Signed-off-by: Babis Chalios <[email protected]>
We always create anew the keyboard interrupt event. Just create it inside `I8042Device::new()` and return an error if that fails. Signed-off-by: Babis Chalios <[email protected]>
test_serial_dos test checks that when we send a lot of bytes in the serial device the emulation logic does not increase indefinitely the underlying buffer that we use for when the device is set in loopback mode. However, the test does not wait for the microVM to start and sometimes the virtual memory allocation may increase between readings. Add a network device to the microVM so that we implicitly wait until it has booted before taking the first measurement. Signed-off-by: Babis Chalios <[email protected]>
Bring in pci crate from cloud hypervisor with a few modifications. We use the rust-vmm vm-allocator crate instead of Cloud Hypervisor's downstream one. For the time being, rust-vmm's implementation should include all we need for supporting the devices we care about. If we need more functionality from our allocators, we will implement the logic directly in the rust-vmm vm-allocator crate. Signed-off-by: Babis Chalios <[email protected]>
PCIe distinguishes MMIO regions between 32bit and 64bit, caring for devices that can't deal with 64-bit addresses. This commit defines the appropriate regions for both x86 and aarch64 architectures, extends the resource allocator to handle allocations for both of these regions and adjusts the logic that calculates the memory regions for the architecture. Also, un-do the change that added an `offset` argument `arch_memory_regions` function. We won't be using this for "secret hiding" so it just made the logic (especially for kani proofs) too convoluted. Signed-off-by: Babis Chalios <[email protected]>
PCIe devices need some times to relocate themselves in memory. To do so, they need to keep an (atomic) reference to a type that implements `DeviceRelocation` trait. The logic for relocation involves removing the device from the bus it has been registered to, allocate a new address range for it and reinsert it. Instead of creating a new type for it, reuse `ResourceAllocator`. This means that we need to move the buses from the `DeviceManager` inside `ResourceAllocator`. Signed-off-by: Babis Chalios <[email protected]>
Add a PCIe segment which includes a single PCIe root port and a bus. At the moment, the PCIe segment is always enabled. Later commit will make it optional and enable it only when a command line argument flag is passed to Firecracker binary. Signed-off-by: Babis Chalios <[email protected]>
So that we can declare which memory region is used by PCIe devices for MMCONFIG. Signed-off-by: Babis Chalios <[email protected]>
Write the PCI root bridge in FDT when PCI is enabled. Signed-off-by: Babis Chalios <[email protected]>
Add a command line argument to enable PCIe support. By default, PCIe is disabled. The reason for making PCIe off by default is that users need to explicitly enable PCI support in their kernels. Requiring users to explicitly enable it, does not break existing deployments, i.e. users can upgrade Firecracker within their existing environments without breaking any deployment. Signed-off-by: Babis Chalios <[email protected]>
At the moment, the logic just restores the device manager and add the PCIe root complex if PCI is enabled. Signed-off-by: Babis Chalios <[email protected]>
Add an integration test that checks that `lspci` correctly locates the PCIe root complex if PCI is enabled for the microVM. Also, add a negative test that checks that PCIe root complex doesn't exist when PCI is not enabled. Also, extend coverage of, at least some of, the tests to ensure that they run with and without PCI configuration enabled. Do that by extending the `uvm_any*` fixtures to yield both variants. Signed-off-by: Babis Chalios <[email protected]>
PCI-enabled guest kernels enable the `extd_apicid` CPU feature for AMD CPU families after 16h. Our supported AMD families (Milan & Genoa) are both 19h. This is irrespective of whether PCI is enabled in Firecracker. Do not mark this as host-only when running with PCI enabled kernels, i.e. all kernels that support ACPI. Signed-off-by: Babis Chalios <[email protected]>
We have some Rust integration tests that check building and booting of microVMs works correctly. Add variants for PCI-enabled microVMs. Signed-off-by: Babis Chalios <[email protected]>
Tests test_spectre_meltdown_checker_on_guest and test_check_vulnerability_files_ab run A/B tests between the HEAD of the target branch and the tip of a PR branch. This will currently fail, because Firecracker builds from the HEAD of the target branch know nothing about the `--enable-pci` command line flag, so launching the Firecracker binary for revision A will fail. Only run these tests for non-PCI uVMs for now. Once this commit gets merged we will re-enable and make sure that everything works as expected. Signed-off-by: Babis Chalios <[email protected]>
1. build the kernel with PCI/e support. 2. fix a race condition between udev renaming the network devices and fcnet setting up the network interfaces 3. install pciutils on the image Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
I've rebuilt the CI artifacts for the new development version. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
The memory monitor was only assuming a single MMIO gap on x86_64 when calculating the memory regions that corresponded to guest memory. Now we need to account for two MMIO gaps in x86 and one in ARM. Signed-off-by: Babis Chalios <[email protected]>
When we re-arranged the MMIO address space in commit 9a165d1 (arch: define 64-bit capable MMIO memory regions) we moved the MMIO region of the boot timer device for x86 systems, but we didn't update the init scripts that hardcode it and use it to report boot time timestamp back to Firecracker. Update the init.c and initramfs values for the region. Also, add a functional test that runs during CI PR tests and makes sure the boot timer works. Signed-off-by: Babis Chalios <[email protected]>
Commit be5a600 (tests: fix MMIO gaps in memory monitor tool) that fixed the memory monitor to account for the 64-bit MMIO region included a left-over debug print. Remove it. Signed-off-by: Babis Chalios <[email protected]>
To have a more consistent naming, it's best to use GSI instead of IRQ, at least in places where it's meant just as an abstract index. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
This patch makes 2 changes to make the test work on PCI: - simplify logic to find device address to be generic irrespective of ACPI/no-ACPI, PCI/no-PCI - move config offset from within the C program to the python test, as it's different between MMIO (0x100) and PCI (0x4000) Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Tell systemd not to use "predictable names" for network devices (eg enp0s1), but keep the ethN set by the kernel. This is equivalent to passing net.ifnames=0 to the kernel command line. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
pci=off is just an optimization to skip the probing, it shouldn't matter to the functionality of the tests. Dropping it to allow them to run with PCI. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
All tests using uvm_plain or uvm_plain_any will start using PCI as well, allowing more coverage for the PCI code. This requires moving the PCI configuration to the VM factory from the spawn method. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
This patch updates all the places in the code with a uvm_plain* fixture when that was equivalent to the previous behaviour. In particular: - microvm_factory.build(guest_kernel_linux_5_10, rootfs) => uvm_plain - microvm_factory.build(guest_kernel, rootfs) => uvm_plain_any Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Simplify the test code by introducing two new fixtures that are used in a few places in the code. This will also allow these tests to run on PCI. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Run the test_run_concurrency with PCI enabled as well. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Refactor the code to use common fixtures and run all the tests with PCI enabled as well. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Run the initrd tests also with PCI enabled to verify everything is still working correctly. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Run test_memory_overhead performance test also with PCI enabled. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Run the restore latency tests also with PCI enabled to verify there is no change. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Fix the block tests by using the correct fixture name. Fixes: eb7248f ("refactor(test): add uvm_plain_acpi and _6_1 fixtures") Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
Probably missed during a rebase of `feature/pcie` on top of `main` branch. Signed-off-by: Babis Chalios <[email protected]>
It is true that writes/reads of an MSI-X table are either 32 or 64 bits long. However, we do check for this invariant in the `match` expression just after the assertion. If the invariant is not held (the guest tried to read/write with an invalid length) we just print an error and continue. This branch of the `match` block is never reached due to the assertion itself. To simplify things, just remove the assertion and let the `match` block logic handle invalid memory accesses. This should also help us better fuzz the bus accesses. Do add a check that the data access is up to 8 bytes long. These are all MMIO or Port IO accesses and they can't be bigger than 8 bytes. So this assertion should never fail in production (unless there's a KVM bug or we try to run Firecracker in some architecture that allows more than 64bit memory accesses). Signed-off-by: Babis Chalios <[email protected]>
We were using a Hashmap to store the GSIs that were used by the vectors of an MSI-X group. These vectors were always indexed starting by 0, so we can just use a simple Vec. Signed-off-by: Babis Chalios <[email protected]>
We are using a HashMap to track the interrupt routes we use in the system. The index to the HashMap is the GSI of the interrupt route. We know the maximum number of GSIs we have available so pre-allocate the space for the HashMap to avoid reallocations at runtime. Signed-off-by: Babis Chalios <[email protected]>
Fix a bug in the common VirtIO configuration for PCI transport where we would use `queue_select` to read the queue's MSI vector without validating it matches a valid queue. This could lead in panics when accessing the `msix_queue` array. The spec states that in such cases we should return `NO_VECTOR` (0xffff), so do that. Signed-off-by: Babis Chalios <[email protected]>
We were shifting by the wrong number of bits for 2-byte accesses. Signed-off-by: Babis Chalios <[email protected]>
This is the equivalent to the fix in 741c29f but for guest writes. We need to make sure that `queue_select` points to a valid queue before setting the MSI-X vector otherwise we'll hit a panic when accessing the underlying `Vec`. Signed-off-by: Babis Chalios <[email protected]>
swiotlb=noforce disables SWIOTLB, which is enabled by the kernel if the physical addresses exceed 32b. This is not needed for us and causes 64MB to be wasted on the microvm. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
We are vending PCI code taken from Cloud Hypervisor's implementation. There is code in there that we don't actually use. So drop it to reduce the dead code in the project. Signed-off-by: Babis Chalios <[email protected]>
Cloud Hypervisor code is passing a resource type argument in the logic that allocates BARs for devices. `Resource` is an Enum where one of its variants is `PciBar`. Not sure what Cloud Hypervisor uses this for, but it seems redundant since this method is specifically used to allocate BAR memory for devices. We definitely don't use it, so remove it. Signed-off-by: Babis Chalios <[email protected]>
Despite the fact we are using a single 64bit MMIO BARs for VirtIO devices, we had code that allowed for multiple BARs including BARs of other types (IO and 32bit MMIO). Remove this code and always assume we are only using 64bit BAR at index 0. Signed-off-by: Babis Chalios <[email protected]>
This method is used to notify the guest about queue events in a way that's most performant with the underlying interrupt implementation. As in IrqTrigger there is no distinction between different queues, it's best to send just one interrupt notifiying that "some queues" have a pending event. Conversely, in VirtioInterruptMsix, we need to trigger a MSI for each distinct queue. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
We noticed that, compared to main, the vsock device is up to 30% slower when PCI is disabled. This is due to the refactor in 087e185 ("fix(vsock): pass correct index when triggering interrupts"), as we're now sending interrupts as soon as we detect the event, rather than deferring them and sending only one interrupt at the end. While with MSI we need to send multiple interrupts, so there is little difference (still, tests show up to 5% improvement with this change), with legacy IRQ there's only one interrupt line so we end up sending multiple back to back interrupts rather than a single one. This patch reverts to the previous behaviour and uses the newly introduced `trigger_queues` method to deduplicate interrupts in the case of IrqTrigger. Signed-off-by: Riccardo Mancini <[email protected]> Signed-off-by: Babis Chalios <[email protected]>
According to the VirtIO spec section 2.1.2, when we enter a failed state we need to to set `DEVICE_NEEDS_RESET` status field and if `DRIVER_OK` is set we need to must send a device configuration change interrupt to the driver. In MMIO code we were trying to follow this logic, but we were trying to get the interrupt object from the device, which fails (and panics) because interrupts are only set in the device upon successful activation. In the PCI transport, instead, we were not doing this at all. The transport layers hold a reference to the interrupts at all times. So, add the missing logic to the PCI transport and use the transport layer reference in both transports to avoid panics. Signed-off-by: Babis Chalios <[email protected]>
MMIO transport layer holds a Mutex to the VirtIO device. Within the logic that handles the handshake between the driver and the device, there were cases where we would take and release the lock multiple times within the same code branch. Change this to only take the lock once. Signed-off-by: Babis Chalios <[email protected]>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5358 +/- ##
==========================================
- Coverage 82.98% 80.64% -2.35%
==========================================
Files 250 265 +15
Lines 26837 30674 +3837
==========================================
+ Hits 22271 24737 +2466
- Misses 4566 5937 +1371
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Testing rebase of
feature/pcie
on top ofmain
License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md
.PR Checklist
tools/devtool checkbuild --all
to verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyle
to verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
in the PR.
CHANGELOG.md
.Runbook for Firecracker API changes.
integration tests.
TODO
.rust-vmm
.