Skip to content

Commit 625fe3b

Browse files
dandrushkoDmytro Andrushkotinova
committed
M #-: GPU pass-though and vLLM appliance for PoC ISO (#487)
Co-authored-by: Dmytro Andrushko <[email protected]> Co-authored-by: Tino Vázquez <[email protected]> (cherry picked from commit 33b497d)
1 parent 94d3944 commit 625fe3b

File tree

5 files changed

+130
-0
lines changed

5 files changed

+130
-0
lines changed
118 KB
Loading
167 KB
Loading
136 KB
Loading
60.9 KB
Loading

content/getting_started/try_opennebula/opennebula_sandbox_deployment/deploy_opennebula_onprem_with_poc_iso.md

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -431,6 +431,136 @@ On a workstation with access to the frontend, a local route to the virtual net c
431431

432432
After the route exists, the workstation should be able to reach the virtual machines running on the frontend without further configuration.
433433

434+
## GPU Configuration
435+
436+
If the OpenNebula evaluation involves GPU management, GPU should be configured in pass-through mode. For the detailed process check [this guide from the official documentation]({{% relref "/product/cluster_configuration/hosts_and_clusters/nvidia_gpu_passthrough" %}}). Overall, a GPU configuration in OpenNebula consists from 2 main stages:
437+
- Host preparation and driver configuration
438+
- OpenNebula settings for PCI pass-through devices
439+
440+
### Host Configuration
441+
442+
To prepare the OpenNebula host complete the following steps:
443+
- Check that IOMMU was enabled on the host using the following command:
444+
```default
445+
# dmesg | grep -i iommu
446+
```
447+
If IOMMU wasn’t enabled on the host, follow the process specified in the official documentation to enable IOMMU - https://docs.opennebula.io/7.0/product/cluster_configuration/hosts_and_clusters/nvidia_gpu_passthrough/.
448+
At the next step GPU has to be bound to the vfio driver. For this, perform the following steps:
449+
1. Install `driverctl` utility:
450+
451+
```default
452+
# dnf install driverctl
453+
```
454+
455+
2. Ensure `vfio-pci` module is loaded on boot:
456+
457+
```default
458+
# echo "vfio-pci" | sudo tee /etc/modules-load.d/vfio-pci.conf
459+
# modprobe vfio-pci
460+
```
461+
462+
3. Identify the GPU's PCI address:
463+
464+
```default
465+
# lspci -D | grep -i nvidia
466+
0000:e1:00.0 3D controller: NVIDIA Corporation GH100 [H100 PCIe] (rev a1)
467+
```
468+
469+
4. Set the driver override. Use the PCI address from the previous step to set an override for the device to use the `vfio-pci` driver.
470+
471+
```default
472+
# driverctl set-override 0000:e1:00.0 vfio-pci
473+
```
474+
475+
5. Verify the driver binding:
476+
Check that the GPU is now using the `vfio-pci` driver.
477+
478+
```default
479+
# lspci -Dnns e1:00.0 -k
480+
Kernel driver in use: vfio-pci
481+
```
482+
483+
#### VFIO Device Ownership
484+
485+
For OpenNebula to manage the GPU, the VFIO device files in `/dev/vfio/` must be owned by the `root:kvm` user and group. This is achieved by creating a `udev` rule.
486+
487+
1. Identify the IOMMU group for your GPU using its PCI address:
488+
489+
```default
490+
# find /sys/kernel/iommu_groups/ -type l | grep e1:00.0
491+
/sys/kernel/iommu_groups/85/devices/0000:e1:00.0
492+
```
493+
In this example, the IOMMU group is `85`.
494+
495+
2. Create a `udev` rule:
496+
Create the file `/etc/udev/rules.d/99-vfio.rules` with the following content:
497+
498+
```default
499+
SUBSYSTEM=="vfio", GROUP="kvm", MODE="0666"
500+
```
501+
502+
3. Reload `udev` rules:
503+
504+
```default
505+
# udevadm control --reload
506+
# udevadm trigger
507+
```
508+
509+
4. Verify ownership:
510+
Check the ownership of the device file corresponding to your GPU's IOMMU group.
511+
512+
```default
513+
# ls -la /dev/vfio/
514+
crw-rw-rw- 1 root kvm 509, 0 Oct 16 10:00 85
515+
516+
### OpenNebula Configuration
517+
518+
Configure the PCI probe on the front-end node to monitor NVIDIA devices in order to make the GPUs available in OpenNebula
519+
520+
1. Edit the PCI probe configuration file at `/var/lib/one/remotes/etc/im/kvm-probes.d/pci.conf`.
521+
2. Add a filter for NVIDIA devices:
522+
523+
```default
524+
:filter: '10de:*'
525+
```
526+
527+
3. Synchronize the hosts from the Front-end to apply the new configuration:
528+
529+
```default
530+
# su - oneadmin
531+
$ onehost sync -f
532+
```
533+
534+
After a few moments, you can check if the GPU is being monitored correctly by showing the host information (`onehost show <HOST_ID>`). The GPU should appear in the `PCI DEVICES` section.
535+
536+
### VM with GPU instantiation
537+
To instantiate VM with a GPU login into the OpenNebula GUI and navigate to the VMs tab. Click “Create”. Then select one of the VM templates On the next screen enter the VM name and click “Next”.
538+
539+
![VM Instantiation](/images/ISO/06-vm-instantiate-1.png)
540+
541+
On the next screen select required Storage and Network options. In the “PCI Devices” section click “Attach PCI device”
542+
543+
![PCI Device attachment](/images/ISO/07-vm-instantiate-pci-device.png)
544+
545+
In the dropdown menu select available GPU device which will be attached to the VM. Then click “Accept” button and finalize VM configuration.
546+
547+
![PCI Device attachment](/images/ISO/08-vm-instantiate-pci-device-select.png)
548+
549+
Click the “Finish” button to start VM instantiation. After a while, the VM will be instantiated and may be used.
550+
551+
### vLLM appliance validation
552+
553+
The vLLM appliance is available through the OpenNebula Marketplace. Follow steps from [this guide from the official documentation]({{% relref "/solutions/deployment_blueprints/ai-ready_opennebula/llm_inference_certification" %}}). To download vLLM appliance and instantiate with a GPU in passthrough mode, the following steps have to be performed:
554+
555+
1. Go to Storage -> Apps section.
556+
Search for vLLM appliance and import it. Select DataStore where to save image
557+
558+
![PCI Device attachment](/images/ISO/09-vllm-appliance.png)
559+
560+
2. Go to VMs section and instantiate vLLM appliance. Specify common VM parameters. In the “Advanced Settings” go to “PCI devices” and ensure that required GPU device selected for attachment to the VM. Click “Accept” and then “Finish” to instantiate vLLM appliance.
561+
562+
3. Once vLLM appliance instantiated, follow steps from [the LLM inference guide]({{% relref "/solutions/deployment_blueprints/ai-ready_opennebula/llm_inference_certification" %}}) to access a webchat app or execute benchmarking tests
563+
434564
## Next Steps
435565
436566
Additionally, we recommend checking [Validate the environment]({{% relref "validate_the_environment" %}}), that describes how to explore the resources installed and how to download and run appliances from the [OpenNebula Marketplace](https://marketplace.opennebula.io/).

0 commit comments

Comments
 (0)