|
2 | 2 | Support for GPUs in OpenStack
|
3 | 3 | =============================
|
4 | 4 |
|
| 5 | +PCI Passthrough |
| 6 | +############### |
| 7 | + |
| 8 | +Prerequisite - BIOS Configuration |
| 9 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 10 | + |
| 11 | +On an Intel system: |
| 12 | + |
| 13 | +* Enable ``VT-x`` in the BIOS for virtualisation support. |
| 14 | +* Enable ``VT-d`` in the BIOS for IOMMU support. |
| 15 | + |
| 16 | +On an AMD system: |
| 17 | + |
| 18 | +* Enable ``AMD-v`` in the BIOS for virtualisation support. |
| 19 | +* Enable ``AMD-Vi`` (also just called ``IOMMU`` on older hardware) in the BIOS |
| 20 | + for IOMMU support. |
| 21 | + |
| 22 | +It may be possible to configure passthrough without these settings, though |
| 23 | +stability or performance may be affected. |
| 24 | + |
| 25 | +Host and Service Configuration |
| 26 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 27 | + |
| 28 | +PCI passthrough GPU variables can be found in the |
| 29 | +``etc/kayobe/stackhpc-compute.yml`` file. |
| 30 | + |
| 31 | +The ``gpu_group_map`` is a dictionary mapping inventory groups to GPU types. |
| 32 | +This is used to determine which GPU types each compute node should pass through |
| 33 | +to OpenStack. The keys are group names, the values are a list of GPU types. |
| 34 | + |
| 35 | +Possible GPU types are defined in the ``stackhpc_gpu_data`` dictionary. It |
| 36 | +contains data for many common GPUs. If you have a GPU that is not included, |
| 37 | +extend the dictionary following the same pattern. |
| 38 | + |
| 39 | +The ``resource_name`` is the name that will be used in the flavor extra specs. |
| 40 | +These can be overridden e.g. ``a100_80_resource_name: "big_gpu"``. |
| 41 | + |
| 42 | +Example configuration for three groups containing A100s, V100s, and both: |
| 43 | + |
| 44 | +.. code-block:: yaml |
| 45 | + :caption: $KAYOBE_CONFIG_PATH/stackhpc-compute.yml |
| 46 | +
|
| 47 | + gpu_group_map: |
| 48 | + compute_a100: |
| 49 | + - a100_80 |
| 50 | + compute_v100: |
| 51 | + - v100_32 |
| 52 | + compute_multi_gpu: |
| 53 | + - a100_80 |
| 54 | + - v100_32 |
| 55 | +
|
| 56 | +All groups in the ``gpu_group_map`` must also be added to |
| 57 | +``kolla_overcloud_inventory_top_level_group_map`` in ``etc/kayobe/kolla.yml``. |
| 58 | +Always include the Kayobe defaults unless you know what you are doing. |
| 59 | + |
| 60 | +When ``gpu_group_map`` is populated, the ``pci-passthrough.yml`` playbook will |
| 61 | +be added as a pre-hook to ``kayobe overcloud host configure``. Either run host |
| 62 | +configuration or trigger the playbook manually: |
| 63 | + |
| 64 | +.. code-block:: console |
| 65 | +
|
| 66 | + kayobe overcloud host configure --limit compute_a100,compute_v100,compute_multi_gpu |
| 67 | + # OR |
| 68 | + kayobe playbook run --playbook $KAYOBE_CONFIG_PATH/ansible/pci-passthrough.yml --limit compute_a100,compute_v100,compute_multi_gpu |
| 69 | +
|
| 70 | +The playbook will apply the necessary configuraion and reboot the hosts if |
| 71 | +required. |
| 72 | + |
| 73 | +Once host configuration is complete, deploy the OpenStack services: |
| 74 | +.. code-block:: console |
| 75 | +
|
| 76 | + kayobe overcloud service deploy -kt nova --kolla-limit compute_a100,compute_v100,compute_multi_gpu |
| 77 | +
|
| 78 | +Create a flavor |
| 79 | +^^^^^^^^^^^^^^^ |
| 80 | + |
| 81 | +For example, to request two of the GPUs with alias **v100_32** |
| 82 | + |
| 83 | +.. code-block:: text |
| 84 | +
|
| 85 | + openstack flavor set m1.medium-gpu --property "pci_passthrough:alias"="v100_32:2" |
| 86 | +
|
| 87 | +This can be also defined in the openstack-config repository |
| 88 | + |
| 89 | +add extra_specs to flavor in etc/openstack-config/openstack-config.yml: |
| 90 | + |
| 91 | +.. code-block:: console |
| 92 | +
|
| 93 | + cd src/openstack-config |
| 94 | + vim etc/openstack-config/openstack-config.yml |
| 95 | +
|
| 96 | + name: "m1.medium-gpu" |
| 97 | + ram: 4096 |
| 98 | + disk: 40 |
| 99 | + vcpus: 2 |
| 100 | + extra_specs: |
| 101 | + "pci_passthrough:alias": "v100_32:2" |
| 102 | +
|
| 103 | +Invoke configuration playbooks afterwards: |
| 104 | + |
| 105 | +.. code-block:: console |
| 106 | +
|
| 107 | + source src/kayobe-config/etc/kolla/public-openrc.sh |
| 108 | + source venvs/openstack/bin/activate |
| 109 | + tools/openstack-config --vault-password-file <Vault password file path> |
| 110 | +
|
| 111 | +Create instance with GPU passthrough |
| 112 | +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 113 | + |
| 114 | +.. code-block:: text |
| 115 | +
|
| 116 | + openstack server create --flavor m1.medium-gpu --image ubuntu22.04 --wait test-pci |
| 117 | +
|
| 118 | +Testing GPU in a Guest VM |
| 119 | +------------------------- |
| 120 | + |
| 121 | +The Nvidia drivers must be installed first. For example, on an Ubuntu guest: |
| 122 | + |
| 123 | +.. code-block:: text |
| 124 | +
|
| 125 | + sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440 |
| 126 | +
|
| 127 | +The ``nvidia-smi`` command will generate detailed output if the driver has |
| 128 | +loaded successfully. |
| 129 | + |
| 130 | + |
5 | 131 | Virtual GPUs
|
6 | 132 | ############
|
7 | 133 |
|
@@ -535,262 +661,6 @@ Changing VGPU device types
|
535 | 661 |
|
536 | 662 | See upstream documentation: `Changing VGPU device types <https://docs.openstack.org/kayobe/latest/configuration/reference/vgpu.html#changing-vgpu-device-types>`__
|
537 | 663 |
|
538 |
| -PCI Passthrough |
539 |
| -############### |
540 |
| - |
541 |
| -This guide has been developed for Nvidia GPUs and CentOS 8. |
542 |
| - |
543 |
| -See `Kayobe Ops <https://github.com/stackhpc/kayobe-ops>`_ for |
544 |
| -a playbook implementation of host setup for GPU. |
545 |
| - |
546 |
| -BIOS Configuration Requirements |
547 |
| -------------------------------- |
548 |
| - |
549 |
| -On an Intel system: |
550 |
| - |
551 |
| -* Enable `VT-x` in the BIOS for virtualisation support. |
552 |
| -* Enable `VT-d` in the BIOS for IOMMU support. |
553 |
| - |
554 |
| -Hypervisor Configuration Requirements |
555 |
| -------------------------------------- |
556 |
| - |
557 |
| -Find the GPU device IDs |
558 |
| -^^^^^^^^^^^^^^^^^^^^^^^ |
559 |
| - |
560 |
| -From the host OS, use ``lspci -nn`` to find the PCI vendor ID and |
561 |
| -device ID for the GPU device and supporting components. These are |
562 |
| -4-digit hex numbers. |
563 |
| - |
564 |
| -For example: |
565 |
| - |
566 |
| -.. code-block:: text |
567 |
| -
|
568 |
| - 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 980M] [10de:13d7] (rev a1) (prog-if 00 [VGA controller]) |
569 |
| - 01:00.1 Audio device [0403]: NVIDIA Corporation GM204 High Definition Audio Controller [10de:0fbb] (rev a1) |
570 |
| -
|
571 |
| -In this case the vendor ID is ``10de``, display ID is ``13d7`` and audio ID is ``0fbb``. |
572 |
| - |
573 |
| -Alternatively, for an Nvidia Quadro RTX 6000: |
574 |
| - |
575 |
| -.. code-block:: yaml |
576 |
| -
|
577 |
| - # NVIDIA Quadro RTX 6000/8000 PCI device IDs |
578 |
| - vendor_id: "10de" |
579 |
| - display_id: "1e30" |
580 |
| - audio_id: "10f7" |
581 |
| - usba_id: "1ad6" |
582 |
| - usba_class: "0c0330" |
583 |
| - usbc_id: "1ad7" |
584 |
| - usbc_class: "0c8000" |
585 |
| -
|
586 |
| -These parameters will be used for device-specific configuration. |
587 |
| - |
588 |
| -Kernel Ramdisk Reconfiguration |
589 |
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
590 |
| - |
591 |
| -The ramdisk loaded during kernel boot can be extended to include the |
592 |
| -vfio PCI drivers and ensure they are loaded early in system boot. |
593 |
| - |
594 |
| -.. code-block:: yaml |
595 |
| -
|
596 |
| - - name: Template dracut config |
597 |
| - blockinfile: |
598 |
| - path: /etc/dracut.conf.d/gpu-vfio.conf |
599 |
| - block: | |
600 |
| - add_drivers+="vfio vfio_iommu_type1 vfio_pci vfio_virqfd" |
601 |
| - owner: root |
602 |
| - group: root |
603 |
| - mode: 0660 |
604 |
| - create: true |
605 |
| - become: true |
606 |
| - notify: |
607 |
| - - Regenerate initramfs |
608 |
| - - reboot |
609 |
| -
|
610 |
| -The handler for regenerating the Dracut initramfs is: |
611 |
| - |
612 |
| -.. code-block:: yaml |
613 |
| -
|
614 |
| - - name: Regenerate initramfs |
615 |
| - shell: |- |
616 |
| - #!/bin/bash |
617 |
| - set -eux |
618 |
| - dracut -v -f /boot/initramfs-$(uname -r).img $(uname -r) |
619 |
| - become: true |
620 |
| -
|
621 |
| -Kernel Boot Parameters |
622 |
| -^^^^^^^^^^^^^^^^^^^^^^ |
623 |
| - |
624 |
| -Set the following kernel parameters by adding to |
625 |
| -``GRUB_CMDLINE_LINUX_DEFAULT`` or ``GRUB_CMDLINE_LINUX`` in |
626 |
| -``/etc/default/grub.conf``. We can use the |
627 |
| -`stackhpc.grubcmdline <https://galaxy.ansible.com/stackhpc/grubcmdline>`_ |
628 |
| -role from Ansible Galaxy: |
629 |
| - |
630 |
| -.. code-block:: yaml |
631 |
| -
|
632 |
| - - name: Add vfio-pci.ids kernel args |
633 |
| - include_role: |
634 |
| - name: stackhpc.grubcmdline |
635 |
| - vars: |
636 |
| - kernel_cmdline: |
637 |
| - - intel_iommu=on |
638 |
| - - iommu=pt |
639 |
| - - "vfio-pci.ids={{ vendor_id }}:{{ display_id }},{{ vendor_id }}:{{ audio_id }}" |
640 |
| - kernel_cmdline_remove: |
641 |
| - - iommu |
642 |
| - - intel_iommu |
643 |
| - - vfio-pci.ids |
644 |
| -
|
645 |
| -Kernel Device Management |
646 |
| -^^^^^^^^^^^^^^^^^^^^^^^^ |
647 |
| - |
648 |
| -In the hypervisor, we must prevent kernel device initialisation of |
649 |
| -the GPU and prevent drivers from loading for binding the GPU in the |
650 |
| -host OS. We do this using ``udev`` rules: |
651 |
| - |
652 |
| -.. code-block:: yaml |
653 |
| -
|
654 |
| - - name: Template udev rules to blacklist GPU usb controllers |
655 |
| - blockinfile: |
656 |
| - # We want this to execute as soon as possible |
657 |
| - path: /etc/udev/rules.d/99-gpu.rules |
658 |
| - block: | |
659 |
| - #Remove NVIDIA USB xHCI Host Controller Devices, if present |
660 |
| - ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usba_class }}", ATTR{remove}="1" |
661 |
| - #Remove NVIDIA USB Type-C UCSI devices, if present |
662 |
| - ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x{{ vendor_id }}", ATTR{class}=="0x{{ usbc_class }}", ATTR{remove}="1" |
663 |
| - owner: root |
664 |
| - group: root |
665 |
| - mode: 0644 |
666 |
| - create: true |
667 |
| - become: true |
668 |
| -
|
669 |
| -Kernel Drivers |
670 |
| -^^^^^^^^^^^^^^ |
671 |
| - |
672 |
| -Prevent the ``nouveau`` kernel driver from loading by |
673 |
| -blacklisting the module: |
674 |
| - |
675 |
| -.. code-block:: yaml |
676 |
| -
|
677 |
| - - name: Blacklist nouveau |
678 |
| - blockinfile: |
679 |
| - path: /etc/modprobe.d/blacklist-nouveau.conf |
680 |
| - block: | |
681 |
| - blacklist nouveau |
682 |
| - options nouveau modeset=0 |
683 |
| - mode: 0664 |
684 |
| - owner: root |
685 |
| - group: root |
686 |
| - create: true |
687 |
| - become: true |
688 |
| - notify: |
689 |
| - - reboot |
690 |
| - - Regenerate initramfs |
691 |
| -
|
692 |
| -Ensure that the ``vfio`` drivers are loaded into the kernel on boot: |
693 |
| - |
694 |
| -.. code-block:: yaml |
695 |
| -
|
696 |
| - - name: Add vfio to modules-load.d |
697 |
| - blockinfile: |
698 |
| - path: /etc/modules-load.d/vfio.conf |
699 |
| - block: | |
700 |
| - vfio |
701 |
| - vfio_iommu_type1 |
702 |
| - vfio_pci |
703 |
| - vfio_virqfd |
704 |
| - owner: root |
705 |
| - group: root |
706 |
| - mode: 0664 |
707 |
| - create: true |
708 |
| - become: true |
709 |
| - notify: reboot |
710 |
| -
|
711 |
| -Once this code has taken effect (after a reboot), the VFIO kernel drivers should be loaded on boot: |
712 |
| - |
713 |
| -.. code-block:: text |
714 |
| -
|
715 |
| - # lsmod | grep vfio |
716 |
| - vfio_pci 49152 0 |
717 |
| - vfio_virqfd 16384 1 vfio_pci |
718 |
| - vfio_iommu_type1 28672 0 |
719 |
| - vfio 32768 2 vfio_iommu_type1,vfio_pci |
720 |
| - irqbypass 16384 5 vfio_pci,kvm |
721 |
| -
|
722 |
| - # lspci -nnk -s 3d:00.0 |
723 |
| - 3d:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GL [Tesla M10] [10de:13bd] (rev a2) |
724 |
| - Subsystem: NVIDIA Corporation Tesla M10 [10de:1160] |
725 |
| - Kernel driver in use: vfio-pci |
726 |
| - Kernel modules: nouveau |
727 |
| -
|
728 |
| -IOMMU should be enabled at kernel level as well - we can verify that on the compute host: |
729 |
| - |
730 |
| -.. code-block:: text |
731 |
| -
|
732 |
| - # docker exec -it nova_libvirt virt-host-validate | grep IOMMU |
733 |
| - QEMU: Checking for device assignment IOMMU support : PASS |
734 |
| - QEMU: Checking if IOMMU is enabled by kernel : PASS |
735 |
| -
|
736 |
| -OpenStack Nova configuration |
737 |
| ----------------------------- |
738 |
| - |
739 |
| -See upsteram Nova documentation: `Attaching physical PCI devices to guests <https://docs.openstack.org/nova/latest/admin/pci-passthrough.html>`__ |
740 |
| - |
741 |
| -Configure a flavor |
742 |
| -^^^^^^^^^^^^^^^^^^ |
743 |
| - |
744 |
| -For example, to request two of the GPUs with alias **a1** |
745 |
| - |
746 |
| -.. code-block:: text |
747 |
| -
|
748 |
| - openstack flavor set m1.medium --property "pci_passthrough:alias"="a1:2" |
749 |
| -
|
750 |
| -
|
751 |
| -This can be also defined in the openstack-config repository |
752 |
| - |
753 |
| -add extra_specs to flavor in etc/openstack-config/openstack-config.yml: |
754 |
| - |
755 |
| -.. code-block:: console |
756 |
| -
|
757 |
| - cd src/openstack-config |
758 |
| - vim etc/openstack-config/openstack-config.yml |
759 |
| -
|
760 |
| - name: "m1.medium-gpu" |
761 |
| - ram: 4096 |
762 |
| - disk: 40 |
763 |
| - vcpus: 2 |
764 |
| - extra_specs: |
765 |
| - "pci_passthrough:alias": "a1:2" |
766 |
| -
|
767 |
| -Invoke configuration playbooks afterwards: |
768 |
| - |
769 |
| -.. code-block:: console |
770 |
| -
|
771 |
| - source src/kayobe-config/etc/kolla/public-openrc.sh |
772 |
| - source venvs/openstack/bin/activate |
773 |
| - tools/openstack-config --vault-password-file <Vault password file path> |
774 |
| -
|
775 |
| -Create instance with GPU passthrough |
776 |
| -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
777 |
| - |
778 |
| -.. code-block:: text |
779 |
| -
|
780 |
| - openstack server create --flavor m1.medium-gpu --image ubuntu22.04 --wait test-pci |
781 |
| -
|
782 |
| -Testing GPU in a Guest VM |
783 |
| -------------------------- |
784 |
| - |
785 |
| -The Nvidia drivers must be installed first. For example, on an Ubuntu guest: |
786 |
| - |
787 |
| -.. code-block:: text |
788 |
| -
|
789 |
| - sudo apt install nvidia-headless-440 nvidia-utils-440 nvidia-compute-utils-440 |
790 |
| -
|
791 |
| -The ``nvidia-smi`` command will generate detailed output if the driver has loaded |
792 |
| -successfully. |
793 |
| - |
794 | 664 | Further Reference
|
795 | 665 | -----------------
|
796 | 666 |
|
|
0 commit comments