Skip to content

Commit dd6bca1

Browse files
authored
Merge pull request #2 from stackhpc/gpu-config
First draft documentation for GPU config in Kayobe
2 parents 159b863 + 0ff66f8 commit dd6bca1

File tree

2 files changed

+342
-0
lines changed

2 files changed

+342
-0
lines changed

source/gpus_in_openstack.rst

Lines changed: 341 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,341 @@
1+
.. include:: vars.rst
2+
3+
=============================
4+
Support for GPUs in OpenStack
5+
=============================
6+
7+
This guide has been developed for Nvidia GPUs and CentOS 8.
8+
9+
See `Kayobe Ops <https://github.com/stackhpc/kayobe-ops>`_ for
10+
a playbook implementation of host setup for GPU.
11+
12+
BIOS Configuration Requirements
13+
-------------------------------
14+
15+
On an Intel system:
16+
17+
* Enable `VT-x` in the BIOS for virtualisation support.
18+
* Enable `VT-d` in the BIOS for IOMMU support.
19+
20+
Hypervisor Configuration Requirements
21+
-------------------------------------
22+
23+
Find the GPU device IDs
24+
^^^^^^^^^^^^^^^^^^^^^^^
25+
26+
From the host OS, use ``lspci -nn`` to find the PCI vendor ID and
27+
device ID for the GPU device and supporting components. These are
28+
4-digit hex numbers.
29+
30+
For example:
31+
32+
.. code-block:: text
33+
34+
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1) (prog-if 00 [VGA controller])
35+
01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1)
36+
37+
In this case the vendor ID is ``10de``, display ID is ``13d7`` and audio ID is ``0fbb``.
38+
39+
Alternatively, for an Nvidia Quadro RTX 6000:
40+
41+
.. code-block:: yaml
42+
43+
# NVIDIA Quadro RTX 6000/8000 PCI device IDs
44+
vendor_id: "10de"
45+
display_id: "1e30"
46+
audio_id: "10f7"
47+
usba_id: "1ad6"
48+
usba_class: "0c0330"
49+
usbc_id: "1ad7"
50+
usbc_class: "0c8000"
51+
52+
These parameters will be used for device-specific configuration.
53+
54+
Kernel Ramdisk Reconfiguration
55+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
56+
57+
The ramdisk loaded during kernel boot can be extended to include the
58+
vfio PCI drivers and ensure they are loaded early in system boot.
59+
60+
.. code-block:: yaml
61+
62+
- name: Template dracut config
63+
blockinfile:
64+
path: /etc/dracut.conf.d/gpu-vfio.conf
65+
block: |
66+
add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd"
67+
owner: root
68+
group: root
69+
mode: 0660
70+
create: true
71+
become: true
72+
notify:
73+
- Regenerate initramfs
74+
- reboot
75+
76+
The handler for regenerating the Dracut initramfs is:
77+
78+
.. code-block:: yaml
79+
80+
- name: Regenerate initramfs
81+
shell: |-
82+
#!/bin/bash
83+
set -eux
84+
dracut -v -f /boot/initramfs-$(uname -r).img $(uname -r)
85+
become: true
86+
87+
Kernel Boot Parameters
88+
^^^^^^^^^^^^^^^^^^^^^^
89+
90+
Set the following kernel parameters by adding to
91+
``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in
92+
``/etc/default/grub.conf``. We can use the
93+
`stackhpc.grubcmdline <https://galaxy.ansible.com/stackhpc/grubcmdline>`_
94+
role from Ansible Galaxy:
95+
96+
.. code-block:: yaml
97+
98+
- name: Add vfio-pci.ids kernel args
99+
include_role:
100+
name: stackhpc.grubcmdline
101+
vars:
102+
kernel_cmdline:
103+
- intel_iommu=on
104+
- iommu=pt
105+
- "vfio-pci.ids={{ vendor_id }}:{{ display_id }},{{ vendor_id }}:{{ audio_id }}"
106+
kernel_cmdline_remove:
107+
- iommu
108+
- intel_iommu
109+
- vfio-pci.ids
110+
111+
Kernel Device Management
112+
^^^^^^^^^^^^^^^^^^^^^^^^
113+
114+
In the hypervisor, we must prevent kernel device initialisation of
115+
the GPU and prevent drivers from loading for binding the GPU in the
116+
host OS. We do this using ``udev`` rules:
117+
118+
.. code-block:: yaml
119+
120+
- name: Template udev rules to blacklist GPU usb controllers
121+
blockinfile:
122+
# We want this to execute as soon as possible
123+
path: /etc/udev/rules.d/99-gpu.rules
124+
block: |
125+
#Remove NVIDIA USB xHCI Host Controller Devices, if present
126+
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usba_class }}", ATTR{remove}="1"
127+
#Remove NVIDIA USB Type-C UCSI devices, if present
128+
ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usbc_class }}", ATTR{remove}="1"
129+
owner: root
130+
group: root
131+
mode: 0644
132+
create: true
133+
become: true
134+
135+
Kernel Drivers
136+
^^^^^^^^^^^^^^
137+
138+
Prevent the ``nouveau`` kernel driver from loading by
139+
blacklisting the module:
140+
141+
.. code-block:: yaml
142+
143+
- name: Blacklist nouveau
144+
blockinfile:
145+
path: /etc/modprobe.d/blacklist-nouveau.conf
146+
block: |
147+
blacklist nouveau
148+
options nouveau modeset=0
149+
mode: 0664
150+
owner: root
151+
group: root
152+
create: true
153+
become: true
154+
notify:
155+
- reboot
156+
- Regenerate initramfs
157+
158+
Ensure that the ``vfio`` drivers are loaded into the kernel on boot:
159+
160+
.. code-block:: yaml
161+
162+
- name: Add vfio to modules-load.d
163+
blockinfile:
164+
path: /etc/modules-load.d/vfio.conf
165+
block: |
166+
vfio
167+
vfio_iommu_type1
168+
vfio_pci
169+
vfio_virqfd
170+
owner: root
171+
group: root
172+
mode: 0664
173+
create: true
174+
become: true
175+
notify: reboot
176+
177+
Once this code has taken effect (after a reboot), the VFIO kernel drivers should be loaded on boot:
178+
179+
.. code-block:: text
180+
181+
# lsmod | grep vfio
182+
vfio_pci 49152 0
183+
vfio_virqfd 16384 1 vfio_pci
184+
vfio_iommu_type1 28672 0
185+
vfio 32768 2 vfio_iommu_type1,vfio_pci
186+
irqbypass 16384 5 vfio_pci,kvm
187+
188+
# lspci -nnk -s 3d:00.0
189+
3d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Tesla M10] [10de:13bd] (rev a2)
190+
Subsystem: NVIDIA Corporation Tesla M10 [10de:1160]
191+
Kernel driver in use: vfio-pci
192+
Kernel modules: nouveau
193+
194+
IOMMU should be enabled at kernel level as well - we can verify that on the compute host:
195+
196+
.. code-block:: text
197+
198+
# docker exec -it nova_libvirt virt-host-validate | grep IOMMU
199+
QEMU: Checking for device assignment IOMMU support : PASS
200+
QEMU: Checking if IOMMU is enabled by kernel : PASS
201+
202+
OpenStack Nova configuration
203+
----------------------------
204+
205+
Configure nova-scheduler
206+
^^^^^^^^^^^^^^^^^^^^^^^^
207+
208+
The nova-scheduler service must be configured to enable the ``PciPassthroughFilter``
209+
To enable it add it to the list of filters to Kolla-Ansible configuration file:
210+
``etc/kayobe/kolla/config/nova.conf``, for instance:
211+
212+
.. code-block:: yaml
213+
214+
[filter_scheduler]
215+
available_filters = nova.scheduler.filters.all_filters
216+
enabled_filters = AvailabilityZoneFilter, ComputeFilter, ComputeCapabilitiesFilter, ImagePropertiesFilter, ServerGroupAntiAffinityFilter, ServerGroupAffinityFilter, PciPassthroughFilter
217+
218+
Configure nova-compute
219+
^^^^^^^^^^^^^^^^^^^^^^
220+
221+
Configuration can be applied in flexible ways using Kolla-Ansible's
222+
methods for `inventory-driven customisation of configuration
223+
<https://docs.openstack.org/kayobe/latest/configuration/reference/kolla-ansible.html#service-configuration>`_.
224+
The following configuration could be added to
225+
``etc/kayobe/kolla/config/nova/nova-compute.conf`` to enable PCI
226+
passthrough of GPU devices for hosts in a group named ``compute_gpu``.
227+
Again, the 4-digit PCI Vendor ID and Device ID extracted from ``lspci
228+
-nn`` can be used here to specify the GPU device(s).
229+
230+
.. code-block:: jinja
231+
232+
[pci]
233+
{% raw %}
234+
{% if inventory_hostname in groups['compute_gpu'] %}
235+
# We could support multiple models of GPU.
236+
# This can be done more selectively using different inventory groups.
237+
# GPU models defined here:
238+
# NVidia Tesla V100 16GB
239+
# NVidia Tesla V100 32GB
240+
# NVidia Tesla P100 16GB
241+
passthrough_whitelist = [{ "vendor_id":"10de", "product_id":"1db4" },
242+
{ "vendor_id":"10de", "product_id":"1db5" },
243+
{ "vendor_id":"10de", "product_id":"15f8" }]
244+
alias = { "vendor_id":"10de", "product_id":"1db4", "device_type":"type-PCI", "name":"gpu-v100-16" }
245+
alias = { "vendor_id":"10de", "product_id":"1db5", "device_type":"type-PCI", "name":"gpu-v100-32" }
246+
alias = { "vendor_id":"10de", "product_id":"15f8", "device_type":"type-PCI", "name":"gpu-p100" }
247+
{% endif %}
248+
{% endraw %}
249+
250+
Configure nova-api
251+
^^^^^^^^^^^^^^^^^^
252+
253+
pci.alias also needs to be configured on the controller.
254+
This configuration should match the configuration found on the compute nodes.
255+
Add it to Kolla-Ansible configuration file:
256+
``etc/kayobe/kolla/config/nova/nova-api.conf``, for instance:
257+
258+
.. code-block:: yaml
259+
260+
[pci]
261+
alias = { "vendor_id":"10de", "product_id":"1db4", "device_type":"type-PCI", "name":"gpu-v100-16" }
262+
alias = { "vendor_id":"10de", "product_id":"1db5", "device_type":"type-PCI", "name":"gpu-v100-32" }
263+
alias = { "vendor_id":"10de", "product_id":"15f8", "device_type":"type-PCI", "name":"gpu-p100" }
264+
265+
Reconfigure nova service
266+
^^^^^^^^^^^^^^^^^^^^^^^^
267+
268+
.. code-block:: text
269+
270+
kayobe overcloud service reconfigure --kolla-tags nova --kolla-skip-tags common --skip-prechecks
271+
272+
Configure a flavor
273+
^^^^^^^^^^^^^^^^^^
274+
275+
For example, to request two of the GPUs with alias gpu-p100
276+
277+
.. code-block:: text
278+
279+
openstack flavor set m1.medium --property "pci_passthrough:alias"="gpu-p100:2"
280+
281+
282+
This can be also defined in the |project_config| repository:
283+
|project_config_source_url|
284+
285+
add extra_specs to flavor in etc/|project_config|/|project_config|.yml:
286+
287+
.. code-block:: console
288+
:substitutions:
289+
290+
admin# cd |base_path|/src/|project_config|
291+
admin# vim etc/|project_config|/|project_config|.yml
292+
293+
name: "m1.medium"
294+
ram: 4096
295+
disk: 40
296+
vcpus: 2
297+
extra_specs:
298+
"pci_passthrough:alias": "gpu-p100:2"
299+
300+
Invoke configuration playbooks afterwards:
301+
302+
.. code-block:: console
303+
:substitutions:
304+
305+
admin# source |base_path|/src/|kayobe_config|/etc/kolla/public-openrc.sh
306+
admin# source |base_path|/venvs/|project_config|/bin/activate
307+
admin# tools/|project_config| --vault-password-file |vault_password_file_path|
308+
309+
Create instance with GPU passthrough
310+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
311+
312+
.. code-block:: text
313+
314+
openstack server create --flavor m1.medium --image ubuntu2004 --wait test-pci
315+
316+
Testing GPU in a Guest VM
317+
-------------------------
318+
319+
The Nvidia drivers must be installed first. For example, on an Ubuntu guest:
320+
321+
.. code-block:: text
322+
323+
sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440
324+
325+
The ``nvidia-smi`` command will generate detailed output if the driver has loaded
326+
successfully.
327+
328+
Further Reference
329+
-----------------
330+
331+
For PCI Passthrough and GPUs in OpenStack:
332+
333+
* Consumer-grade GPUs: https://gist.github.com/claudiok/890ab6dfe76fa45b30081e58038a9215
334+
* https://www.jimmdenton.com/gpu-offloading-openstack/
335+
* https://docs.openstack.org/nova/latest/admin/pci-passthrough.html
336+
* https://docs.openstack.org/nova/latest/admin/virtual-gpu.html (vGPU only)
337+
* Tesla models in OpenStack: https://egallen.com/openstack-nvidia-tesla-gpu-passthrough/
338+
* https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF
339+
* https://www.kernel.org/doc/Documentation/Intel-IOMMU.txt
340+
* https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough
341+
* https://www.gresearch.co.uk/article/utilising-the-openstack-placement-service-to-schedule-gpu-and-nvme-workloads-alongside-general-purpose-instances/

source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ Contents
2525
managing_users_and_projects
2626
operations_and_monitoring
2727
customising_deployment
28+
gpus_in_openstack
2829

2930
Indices and search
3031
==================

0 commit comments

Comments
 (0)