|
| 1 | +.. include:: vars.rst |
| 2 | + |
| 3 | +============================= |
| 4 | +Support for GPUs in OpenStack |
| 5 | +============================= |
| 6 | + |
| 7 | +This guide is has been developed for Nvidia GPUs and CentOS 8. |
| 8 | + |
| 9 | +See `Kayobe Ops <https://github.com/stackhpc/kayobe-ops>`_ for |
| 10 | +a playbook implementation of host setup for GPU. |
| 11 | + |
| 12 | +BIOS Configuration Requirements |
| 13 | +------------------------------- |
| 14 | + |
| 15 | +On an Intel system: |
| 16 | + |
| 17 | +* Enable `VT-x` in the BIOS for virtualisation support. |
| 18 | +* Enable `VT-d` in the BIOS for IOMMU support. |
| 19 | + |
| 20 | +Hypervisor Configuration Requirements |
| 21 | +------------------------------------- |
| 22 | + |
| 23 | +Find the GPU device IDs |
| 24 | +~~~~~~~~~~~~~~~~~~~~~~~ |
| 25 | + |
| 26 | +From the host OS, use ``lspci -nn`` to find the PCI vendor ID and |
| 27 | +device ID for the GPU device and supporting components. These are |
| 28 | +4-digit hex numbers. |
| 29 | + |
| 30 | +For example: |
| 31 | + |
| 32 | +.. code-block:: text |
| 33 | +
|
| 34 | + 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1) (prog-if 00 [VGA controller]) |
| 35 | + 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) |
| 36 | +
|
| 37 | +In this case the vendor ID is ``10de``, display ID is ``13d7`` and audio ID is ``0fbb``. |
| 38 | + |
| 39 | +Alternatively, for an Nvidia Quadro RTX 6000: |
| 40 | + |
| 41 | +.. code-block:: yaml |
| 42 | +
|
| 43 | + # NVIDIA Quadro RTX 6000/8000 PCI device IDs |
| 44 | + vendor_id: "10de" |
| 45 | + display_id: "1e30" |
| 46 | + audio_id: "10f7" |
| 47 | + usba_id: "1ad6" |
| 48 | + usba_class: "0c0330" |
| 49 | + usbc_id: "1ad7" |
| 50 | + usbc_class: "0c8000" |
| 51 | +
|
| 52 | +These parameters will be used for device-specific configuration. |
| 53 | + |
| 54 | +Kernel Ramdisk Reconfiguration |
| 55 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 56 | + |
| 57 | +The ramdisk loaded during kernel boot can be extended to include the |
| 58 | +vfio PCI drivers and ensure they are loaded early in system boot. |
| 59 | + |
| 60 | +.. code-block:: yaml |
| 61 | +
|
| 62 | + - name: Template dracut config |
| 63 | + blockinfile: |
| 64 | + path: /etc/dracut.conf.d/gpu-vfio.conf |
| 65 | + block: | |
| 66 | + add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd" |
| 67 | + owner: root |
| 68 | + group: root |
| 69 | + mode: 0660 |
| 70 | + create: true |
| 71 | + become: true |
| 72 | + notify: |
| 73 | + - Regenerate initramfs |
| 74 | + - reboot |
| 75 | +
|
| 76 | +The handler for regenerating the Dracut initramfs is: |
| 77 | + |
| 78 | +.. code-block:: yaml |
| 79 | +
|
| 80 | + - name: Regenerate initramfs |
| 81 | + shell: |- |
| 82 | + #!/bin/bash |
| 83 | + set -eux |
| 84 | + dracut -v -f /boot/initramfs-$(uname -r).img $(uname -r) |
| 85 | + become: true |
| 86 | +
|
| 87 | +Kernel Boot Parameters |
| 88 | +~~~~~~~~~~~~~~~~~~~~~~ |
| 89 | + |
| 90 | +Set the following kernel parameters by adding to |
| 91 | +``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in |
| 92 | +``/etc/default/grub.conf``. We can use the |
| 93 | +`stackhpc.grubcmdline <https://galaxy.ansible.com/stackhpc/grubcmdline>`_ |
| 94 | +role from Ansible Galaxy: |
| 95 | + |
| 96 | +.. code-block:: yaml |
| 97 | +
|
| 98 | + - name: Add vfio-pci.ids kernel args |
| 99 | + include_role: |
| 100 | + name: stackhpc.grubcmdline |
| 101 | + vars: |
| 102 | + kernel_cmdline: |
| 103 | + - intel_iommu=on |
| 104 | + - iommu=pt |
| 105 | + - "vfio-pci.ids={{ vendor_id }}:{{ display_id }},{{ vendor_id }}:{{ audio_id }}" |
| 106 | + kernel_cmdline_remove: |
| 107 | + - iommu |
| 108 | + - intel_iommu |
| 109 | + - vfio-pci.ids: |
| 110 | +
|
| 111 | +Kernel Device Management |
| 112 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 113 | + |
| 114 | +In the hypervisor, we must prevent kernel device initialisation of |
| 115 | +the GPU and prevent drivers from loading for binding the GPU in the |
| 116 | +host OS. We do this using ``udev`` rules: |
| 117 | + |
| 118 | +.. code-block:: yaml |
| 119 | +
|
| 120 | + - name: Template udev rules to blacklist GPU usb controllers |
| 121 | + blockinfile: |
| 122 | + # We want this to execute as soon as possible |
| 123 | + path: /etc/udev/rules.d/99-gpu.rules |
| 124 | + block: | |
| 125 | + #Remove NVIDIA USB xHCI Host Controller Devices, if present |
| 126 | + ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usba_class }}", ATTR{remove}="1" |
| 127 | + #Remove NVIDIA USB Type-C UCSI devices, if present |
| 128 | + ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usbc_class }}", ATTR{remove}="1" |
| 129 | + owner: root |
| 130 | + group: root |
| 131 | + mode: 0644 |
| 132 | + create: true |
| 133 | + become: true |
| 134 | +
|
| 135 | +Kernel Drivers |
| 136 | +~~~~~~~~~~~~~~ |
| 137 | + |
| 138 | +Prevent the ``nouveau`` kernel driver from loading by |
| 139 | +blacklisting the module: |
| 140 | + |
| 141 | +.. code-block:: yaml |
| 142 | +
|
| 143 | + - name: Blacklist nouveau |
| 144 | + blockinfile: |
| 145 | + path: /etc/modprobe.d/blacklist-nouveau.conf |
| 146 | + block: | |
| 147 | + blacklist nouveau |
| 148 | + options nouveau modeset=0 |
| 149 | + mode: 0664 |
| 150 | + owner: root |
| 151 | + group: root |
| 152 | + create: true |
| 153 | + become: true |
| 154 | + notify: |
| 155 | + - reboot |
| 156 | + - Regenerate initramfs |
| 157 | +
|
| 158 | +Ensure that the ``vfio`` drivers are loaded into the kernel on boot: |
| 159 | + |
| 160 | +.. code-block:: yaml |
| 161 | +
|
| 162 | + - name: Add vfio to modules-load.d |
| 163 | + blockinfile: |
| 164 | + path: /etc/modules-load.d/vfio.conf |
| 165 | + block: | |
| 166 | + vfio |
| 167 | + vfio_iommu_type1 |
| 168 | + vfio_pci |
| 169 | + vfio_virqfd |
| 170 | + owner: root |
| 171 | + group: root |
| 172 | + mode: 0664 |
| 173 | + create: true |
| 174 | + become: true |
| 175 | + notify: reboot |
| 176 | +
|
| 177 | +Once this code has taken effect (after a reboot), the VFIO kernel drivers should be loaded on boot: |
| 178 | + |
| 179 | +.. code-block:: text |
| 180 | +
|
| 181 | + # lsmod | grep vfio |
| 182 | + vfio_pci 49152 0 |
| 183 | + vfio_virqfd 16384 1 vfio_pci |
| 184 | + vfio_iommu_type1 28672 0 |
| 185 | + vfio 32768 2 vfio_iommu_type1,vfio_pci |
| 186 | + irqbypass 16384 5 vfio_pci,kvm |
| 187 | +
|
| 188 | +OpenStack Nova configuration |
| 189 | +---------------------------- |
| 190 | + |
| 191 | +Testing GPU in a Guest VM |
| 192 | +------------------------- |
| 193 | + |
| 194 | +The Nvidia drivers must be installed first. For example, on an Ubuntu guest: |
| 195 | + |
| 196 | +.. code-block:: text |
| 197 | +
|
| 198 | + sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440 |
| 199 | +
|
| 200 | +The ``nvidia-smi`` command will generate detailed output if the driver has loaded |
| 201 | +successfully. |
| 202 | + |
| 203 | +Further Reference |
| 204 | +----------------- |
| 205 | + |
| 206 | +For PCI Passthrough and GPUs in OpenStack: |
| 207 | + |
| 208 | +* Consumer-grade GPUs: https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215 |
| 209 | +* https://www.jimmdenton.com/gpu-offloading-openstack/ |
| 210 | +* https://docs.openstack.org/nova/latest/admin/pci-passthrough.html |
| 211 | +* https://docs.openstack.org/nova/latest/admin/virtual-gpu.html (vGPU only) |
| 212 | +* Telsa models in OpenStack: https://egallen.com/openstack-nvidia-tesla-gpu-passthrough/ |
| 213 | +* https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF |
| 214 | +* https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt |
| 215 | +* https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough |
| 216 | +* https://www.gresearch.co.uk/article/utilising-the-openstack-placement-service-to-schedule-gpu-and-nvme-workloads-alongside-general-purpose-instances/ |
| 217 | + |
0 commit comments