This directory contains the necessary configurations to deploy OpenStack with Nova configured for full GPU device passthrough (VFIO). This setup allows entire physical GPUs on compute nodes to be passed directly to virtual machines, providing near-native performance. Nova control plane is configured for requesting PCI devices from Placement.
This configuration performs the following actions:
- Host Kernel Configuration: It configures the compute node's kernel to enable IOMMU and bind specific GPUs to the
vfio-pcidriver, preventing the host from using them. - Nova Scheduler Configuration: It configures the Nova scheduler to be aware of the PCI devices available for passthrough.
- Nova Compute Configuration: It whitelists the passthrough-capable GPUs in Nova on the compute nodes.
Unlike SR-IOV or mdev (mediated device) setups, this configuration does not require installing the NVIDIA driver on the host. The driver is only installed inside the guest VM that consumes the GPU.
The following parameters are crucial for host-level configuration:
-
BareMetalHost configuration:
baremetalhostssection contains information required by metal3 to provision baremetal nodes.bmc.address: The IP address of the Baseboard Management Controller (BMC).bootMACAddress: The MAC address of the network interface that the node will use to PXE boot.rootDeviceHints: Hints for Metal3 to identify the root device for the OS installation.preprovisioningNetworkData: Static nmstate network config to be applied to aBaremetalHostvia ironic-python-agent ramdisk during provisioning. The config is embedded in the ISO attached as virtual media via the BMC, so no DHCP is required.baremetalHostsNetworkData: Final nmstate network configuration for EDPM nodes.
-
edpm_kernel_args: Appends necessary kernel arguments for VFIO passthrough.intel_iommu=on iommu=pt: Enables the IOMMU for device passthrough.vfio-pci.ids=10de:20f1: Instructs thevfio-pcidriver to claim the specified GPU(s) by their vendor and product IDs at boot time. The example IDs10de:20f1are for an NVIDIA A100 GPU.rd.driver.pre=vfio-pci: Avoids race conditions during boot by loading vfio-pci kernel module early.
-
edpm_tuned_profileandedpm_tuned_isolated_cores: These parameters configure thetunedservice.edpm_tuned_profileis set tocpu-partitioning-powersaveto enable CPU isolation features.edpm_tuned_isolated_coresspecifies the cores to be isolated. For CPU isolation we strongly recommend using the Tuned approach rather thanisolcpuskernel argument.
-
VFIO-PCI Binding Service: The
vfio-pci-bindservice indt/nova/nova04delta/edpm/nodeset/nova_gpu.yamlblacklists thenouveauandnvidiakernel modules to ensure they do not interfere with thevfio-pcidriver. The service also regenerates the initramfs and grub configuration to apply these changes. A reboot is required for these changes to take effect.
A count of X PCI devices may be requested through "pci_passthrough:alias"="nvidia_a2:X" flavor extra specs:
$ openstack --os-compute-api=2.86 flavor set --property "pci_passthrough:alias"="nvidia_a2:1" device_passthrough
See README.md for deployment instructions. There are most essential configuration values to define:
[pci]alias: Creates an alias for a specific GPU type. This allows users to request a GPU by a friendly name (e.g.,nvidia_a2) when creating a VM. This configuration should match the configuration found on the compute nodes.nova: apiServiceTemplate: customServiceConfig: | [pci] alias = { "vendor_id":"10de", "product_id":"20f1", "device_type":"type-PF", "name":"nvidia_a2" }
[filter_scheduler]pci_in_placement: Enables PCI in Placement. It should only be enabled after all the computes in the system become configured to report PCI inventory in Placement via enabling[pci]report_in_placementin EDPM nodesets configuration. However, this order must be ensured during major upgrades only, where the dataplane deployment to upate EDPM computes configurataion must come before reconfiguring control plane resources.device_typein the alias is dependent on the actual hardware:type-PF: The device supports SR-IOV and is the parent or root device.type-VF: The device is a child device of a device that supports SR-IOV.type-PCI: The device does not support SR-IOV.
See dataplane section for deployment instructions. There are most essential configuration values to define:
[pci]report_in_placement: Required for PCI in placement to work.[pci]device_spec: Whitelists the physical GPUs that are available for passthrough. You must create adevice_specentry for each physical GPU you want to make available. For example:nova: pci: conf: | [pci] device_spec = { "vendor_id":"10de", "product_id":"20f1", "address": "0000:04:00.0" } device_spec = { "vendor_id":"10de", "product_id":"20f1", "address": "0000:82:00.0" } alias = { "vendor_id":"10de", "product_id":"20f1", "device_type":"type-PF", "name":"nvidia_a2" }
In addition to PCI device configuration, the nova.compute.conf section includes parameters for resource management on the compute node:
[DEFAULT]reserved_host_memory_mb: Specifies the amount of memory (in megabytes) to reserve for the host operating system and other non-OpenStack services. This memory will not be available for allocation to virtual machines.[compute]cpu_shared_set: A list of physical CPUs that are available for host processes and for virtual machines that do not have dedicated CPUs (i.e., unpinned VMs). These should be the CPUs that are not isolated byedpm_tuned_isolated_cores.[compute]cpu_dedicated_set: A list of physical CPUs that are exclusively reserved for virtual machines with dedicated CPU pinning policies. To ensure performance isolation, this list should correspond directly to the CPUs isolated usingedpm_tuned_isolated_coresparameter.[DEFAULT]reserved_huge_pages: Defines the number and size of huge pages to reserve for the host, making them unavailable for guest VMs. This configuration works in conjunction with thehugepagesandhugepageszkernel arguments, which define the total pool of huge pages on the host.
Note: In a full device passthrough scenario, the [devices]enabled_vgpu_types option in Nova's configuration is not used. This option is specific to mediated device (mdev) configurations.
To use the passthrough GPU, the guest operating system inside the VM must have the appropriate native NVIDIA driver installed. You will need a standard NVIDIA driver. Do not use vGPU-enabled guest drivers. The GPU will appear as a physical PCI device within the guest.