Skip to content

Commit 7b2bc8b

Browse files
committed
Add docs
1 parent a2ebafe commit 7b2bc8b

File tree

2 files changed

+144
-256
lines changed

2 files changed

+144
-256
lines changed

doc/source/operations/gpu-in-openstack.rst

Lines changed: 126 additions & 256 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,132 @@
22
Support for GPUs in OpenStack
33
=============================
44

5+
PCI Passthrough
6+
###############
7+
8+
Prerequisite - BIOS Configuration
9+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
10+
11+
On an Intel system:
12+
13+
* Enable ``VT-x`` in the BIOS for virtualisation support.
14+
* Enable ``VT-d`` in the BIOS for IOMMU support.
15+
16+
On an AMD system:
17+
18+
* Enable ``AMD-v`` in the BIOS for virtualisation support.
19+
* Enable ``AMD-Vi`` (also just called ``IOMMU`` on older hardware) in the BIOS
20+
for IOMMU support.
21+
22+
It may be possible to configure passthrough without these settings, though
23+
stability or performance may be affected.
24+
25+
Host and Service Configuration
26+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
27+
28+
PCI passthrough GPU variables can be found in the
29+
``etc/kayobe/stackhpc-compute.yml`` file.
30+
31+
The ``gpu_group_map`` is a dictionary mapping inventory groups to GPU types.
32+
This is used to determine which GPU types each compute node should pass through
33+
to OpenStack. The keys are group names, the values are a list of GPU types.
34+
35+
Possible GPU types are defined in the ``stackhpc_gpu_data`` dictionary. It
36+
contains data for many common GPUs. If you have a GPU that is not included,
37+
extend the dictionary following the same pattern.
38+
39+
The ``resource_name`` is the name that will be used in the flavor extra specs.
40+
These can be overridden e.g. ``a100_80_resource_name: "big_gpu"``.
41+
42+
Example configuration for three groups containing A100s, V100s, and both:
43+
44+
.. code-block:: yaml
45+
:caption: $KAYOBE_CONFIG_PATH/stackhpc-compute.yml
46+
47+
gpu_group_map:
48+
compute_a100:
49+
- a100_80
50+
compute_v100:
51+
- v100_32
52+
compute_multi_gpu:
53+
- a100_80
54+
- v100_32
55+
56+
All groups in the ``gpu_group_map`` must also be added to
57+
``kolla_overcloud_inventory_top_level_group_map`` in ``etc/kayobe/kolla.yml``.
58+
Always include the Kayobe defaults unless you know what you are doing.
59+
60+
When ``gpu_group_map`` is populated, the ``pci-passthrough.yml`` playbook will
61+
be added as a pre-hook to ``kayobe overcloud host configure``. Either run host
62+
configuration or trigger the playbook manually:
63+
64+
.. code-block:: console
65+
66+
kayobe overcloud host configure --limit compute_a100,compute_v100,compute_multi_gpu
67+
# OR
68+
kayobe playbook run --playbook $KAYOBE_CONFIG_PATH/ansible/pci-passthrough.yml --limit compute_a100,compute_v100,compute_multi_gpu
69+
70+
The playbook will apply the necessary configuraion and reboot the hosts if
71+
required.
72+
73+
Once host configuration is complete, deploy the OpenStack services:
74+
.. code-block:: console
75+
76+
kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu
77+
78+
Create a flavor
79+
^^^^^^^^^^^^^^^
80+
81+
For example, to request two of the GPUs with alias **v100_32**
82+
83+
.. code-block:: text
84+
85+
openstack flavor set m1.medium-gpu --property "pci_passthrough:alias"="v100_32:2"
86+
87+
This can be also defined in the openstack-config repository
88+
89+
add extra_specs to flavor in etc/openstack-config/openstack-config.yml:
90+
91+
.. code-block:: console
92+
93+
cd src/openstack-config
94+
vim etc/openstack-config/openstack-config.yml
95+
96+
name: "m1.medium-gpu"
97+
ram: 4096
98+
disk: 40
99+
vcpus: 2
100+
extra_specs:
101+
"pci_passthrough:alias": "v100_32:2"
102+
103+
Invoke configuration playbooks afterwards:
104+
105+
.. code-block:: console
106+
107+
source src/kayobe-config/etc/kolla/public-openrc.sh
108+
source venvs/openstack/bin/activate
109+
tools/openstack-config --vault-password-file <Vault password file path>
110+
111+
Create instance with GPU passthrough
112+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
113+
114+
.. code-block:: text
115+
116+
openstack server create --flavor m1.medium-gpu --image ubuntu22.04 --wait test-pci
117+
118+
Testing GPU in a Guest VM
119+
-------------------------
120+
121+
The Nvidia drivers must be installed first. For example, on an Ubuntu guest:
122+
123+
.. code-block:: text
124+
125+
sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440
126+
127+
The ``nvidia-smi`` command will generate detailed output if the driver has
128+
loaded successfully.
129+
130+
5131
Virtual GPUs
6132
############
7133

@@ -535,262 +661,6 @@ Changing VGPU device types
535661

536662
See upstream documentation: `Changing VGPU device types <https://docs.openstack.org/kayobe/latest/configuration/reference/vgpu.html#changing-vgpu-device-types>`__
537663

538-
PCI Passthrough
539-
###############
540-
541-
This guide has been developed for Nvidia GPUs and CentOS 8.
542-
543-
See `Kayobe Ops <https://github.com/stackhpc/kayobe-ops>`_ for
544-
a playbook implementation of host setup for GPU.
545-
546-
BIOS Configuration Requirements
547-
-------------------------------
548-
549-
On an Intel system:
550-
551-
* Enable `VT-x` in the BIOS for virtualisation support.
552-
* Enable `VT-d` in the BIOS for IOMMU support.
553-
554-
Hypervisor Configuration Requirements
555-
-------------------------------------
556-
557-
Find the GPU device IDs
558-
^^^^^^^^^^^^^^^^^^^^^^^
559-
560-
From the host OS, use ``lspci -nn`` to find the PCI vendor ID and
561-
device ID for the GPU device and supporting components. These are
562-
4-digit hex numbers.
563-
564-
For example:
565-
566-
.. code-block:: text
567-
568-
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1) (prog-if 00 [VGA controller])
569-
01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
570-
571-
In this case the vendor ID is ``10de``, display ID is ``13d7`` and audio ID is ``0fbb``.
572-
573-
Alternatively, for an Nvidia Quadro RTX 6000:
574-
575-
.. code-block:: yaml
576-
577-
# NVIDIA Quadro RTX 6000/8000 PCI device IDs
578-
vendor_id: "10de"
579-
display_id: "1e30"
580-
audio_id: "10f7"
581-
usba_id: "1ad6"
582-
usba_class: "0c0330"
583-
usbc_id: "1ad7"
584-
usbc_class: "0c8000"
585-
586-
These parameters will be used for device-specific configuration.
587-
588-
Kernel Ramdisk Reconfiguration
589-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
590-
591-
The ramdisk loaded during kernel boot can be extended to include the
592-
vfio PCI drivers and ensure they are loaded early in system boot.
593-
594-
.. code-block:: yaml
595-
596-
- name: Template dracut config
597-
blockinfile:
598-
path: /etc/dracut.conf.d/gpu-vfio.conf
599-
block: |
600-
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
601-
owner: root
602-
group: root
603-
mode: 0660
604-
create: true
605-
become: true
606-
notify:
607-
- Regenerate initramfs
608-
- reboot
609-
610-
The handler for regenerating the Dracut initramfs is:
611-
612-
.. code-block:: yaml
613-
614-
- name: Regenerate initramfs
615-
shell: |-
616-
#!/bin/bash
617-
set -eux
618-
dracut -v -f /boot/initramfs-$(uname -r).img $(uname -r)
619-
become: true
620-
621-
Kernel Boot Parameters
622-
^^^^^^^^^^^^^^^^^^^^^^
623-
624-
Set the following kernel parameters by adding to
625-
``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in
626-
``/etc/default/grub.conf``. We can use the
627-
`stackhpc.grubcmdline <https://galaxy.ansible.com/stackhpc/grubcmdline>`_
628-
role from Ansible Galaxy:
629-
630-
.. code-block:: yaml
631-
632-
- name: Add vfio-pci.ids kernel args
633-
include_role:
634-
name: stackhpc.grubcmdline
635-
vars:
636-
kernel_cmdline:
637-
- intel_iommu=on
638-
- iommu=pt
639-
- "vfio-pci.ids={{ vendor_id }}:{{ display_id }},{{ vendor_id }}:{{ audio_id }}"
640-
kernel_cmdline_remove:
641-
- iommu
642-
- intel_iommu
643-
- vfio-pci.ids
644-
645-
Kernel Device Management
646-
^^^^^^^^^^^^^^^^^^^^^^^^
647-
648-
In the hypervisor, we must prevent kernel device initialisation of
649-
the GPU and prevent drivers from loading for binding the GPU in the
650-
host OS. We do this using ``udev`` rules:
651-
652-
.. code-block:: yaml
653-
654-
- name: Template udev rules to blacklist GPU usb controllers
655-
blockinfile:
656-
# We want this to execute as soon as possible
657-
path: /etc/udev/rules.d/99-gpu.rules
658-
block: |
659-
#Remove NVIDIA USB xHCI Host Controller Devices, if present
660-
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usba_class }}", ATTR{remove}="1"
661-
#Remove NVIDIA USB Type-C UCSI devices, if present
662-
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usbc_class }}", ATTR{remove}="1"
663-
owner: root
664-
group: root
665-
mode: 0644
666-
create: true
667-
become: true
668-
669-
Kernel Drivers
670-
^^^^^^^^^^^^^^
671-
672-
Prevent the ``nouveau`` kernel driver from loading by
673-
blacklisting the module:
674-
675-
.. code-block:: yaml
676-
677-
- name: Blacklist nouveau
678-
blockinfile:
679-
path: /etc/modprobe.d/blacklist-nouveau.conf
680-
block: |
681-
blacklist nouveau
682-
options nouveau modeset=0
683-
mode: 0664
684-
owner: root
685-
group: root
686-
create: true
687-
become: true
688-
notify:
689-
- reboot
690-
- Regenerate initramfs
691-
692-
Ensure that the ``vfio`` drivers are loaded into the kernel on boot:
693-
694-
.. code-block:: yaml
695-
696-
- name: Add vfio to modules-load.d
697-
blockinfile:
698-
path: /etc/modules-load.d/vfio.conf
699-
block: |
700-
vfio
701-
vfio_iommu_type1
702-
vfio_pci
703-
vfio_virqfd
704-
owner: root
705-
group: root
706-
mode: 0664
707-
create: true
708-
become: true
709-
notify: reboot
710-
711-
Once this code has taken effect (after a reboot), the VFIO kernel drivers should be loaded on boot:
712-
713-
.. code-block:: text
714-
715-
# lsmod | grep vfio
716-
vfio_pci 49152 0
717-
vfio_virqfd 16384 1 vfio_pci
718-
vfio_iommu_type1 28672 0
719-
vfio 32768 2 vfio_iommu_type1,vfio_pci
720-
irqbypass 16384 5 vfio_pci,kvm
721-
722-
# lspci -nnk -s 3d:00.0
723-
3d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Tesla M10] [10de:13bd] (rev a2)
724-
Subsystem: NVIDIA Corporation Tesla M10 [10de:1160]
725-
Kernel driver in use: vfio-pci
726-
Kernel modules: nouveau
727-
728-
IOMMU should be enabled at kernel level as well - we can verify that on the compute host:
729-
730-
.. code-block:: text
731-
732-
# docker exec -it nova_libvirt virt-host-validate | grep IOMMU
733-
QEMU: Checking for device assignment IOMMU support : PASS
734-
QEMU: Checking if IOMMU is enabled by kernel : PASS
735-
736-
OpenStack Nova configuration
737-
----------------------------
738-
739-
See upsteram Nova documentation: `Attaching physical PCI devices to guests <https://docs.openstack.org/nova/latest/admin/pci-passthrough.html>`__
740-
741-
Configure a flavor
742-
^^^^^^^^^^^^^^^^^^
743-
744-
For example, to request two of the GPUs with alias **a1**
745-
746-
.. code-block:: text
747-
748-
openstack flavor set m1.medium --property "pci_passthrough:alias"="a1:2"
749-
750-
751-
This can be also defined in the openstack-config repository
752-
753-
add extra_specs to flavor in etc/openstack-config/openstack-config.yml:
754-
755-
.. code-block:: console
756-
757-
cd src/openstack-config
758-
vim etc/openstack-config/openstack-config.yml
759-
760-
name: "m1.medium-gpu"
761-
ram: 4096
762-
disk: 40
763-
vcpus: 2
764-
extra_specs:
765-
"pci_passthrough:alias": "a1:2"
766-
767-
Invoke configuration playbooks afterwards:
768-
769-
.. code-block:: console
770-
771-
source src/kayobe-config/etc/kolla/public-openrc.sh
772-
source venvs/openstack/bin/activate
773-
tools/openstack-config --vault-password-file <Vault password file path>
774-
775-
Create instance with GPU passthrough
776-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
777-
778-
.. code-block:: text
779-
780-
openstack server create --flavor m1.medium-gpu --image ubuntu22.04 --wait test-pci
781-
782-
Testing GPU in a Guest VM
783-
-------------------------
784-
785-
The Nvidia drivers must be installed first. For example, on an Ubuntu guest:
786-
787-
.. code-block:: text
788-
789-
sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440
790-
791-
The ``nvidia-smi`` command will generate detailed output if the driver has loaded
792-
successfully.
793-
794664
Further Reference
795665
-----------------
796666

0 commit comments

Comments
 (0)