@@ -458,6 +458,76 @@ Booting the VM:
458458 $ openstack server add security group nvidia-dls-1 nvidia-dls
459459
460460
461+ Manual VM driver and licence configuration
462+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463+
464+ vGPU client VMs need to be configured with Nvidia drivers to run GPU workloads.
465+ The host drivers should already be applied to the hypervisor.
466+
467+ GCP hosts compatible client drivers `here
468+ <https://cloud.google.com/compute/docs/gpus/grid-drivers-table> `__.
469+
470+ Find the correct version (when in doubt, use the same version as the host) and
471+ download it to the VM. The exact dependencies will depend on the base image you
472+ are using but at a minimum, you will need GCC installed.
473+
474+ Ubuntu Jammy example:
475+
476+ .. code-block :: bash
477+
478+ sudo apt update
479+ sudo apt install -y make gcc wget
480+ wget https://storage.googleapis.com/nvidia-drivers-us-public/GRID/vGPU17.1/NVIDIA-Linux-x86_64-550.54.15-grid.run
481+ sudo sh NVIDIA-Linux-x86_64-550.54.15-grid.run
482+
483+ Check the ``nvidia-smi `` client is available:
484+
485+ .. code-block :: bash
486+
487+ nvidia-smi
488+
489+ Generate a token from the licence server, and copy the token file to the client
490+ VM.
491+
492+ On the client, create an Nvidia grid config file from the template:
493+
494+ .. code-block :: bash
495+
496+ sudo cp /etc/nvidia/gridd.conf.template /etc/nvidia/gridd.conf
497+
498+ Edit it to set ``FeatureType=1 `` and leave the rest of the settings as default.
499+
500+ Copy the client configuration token into the ``/etc/nvidia/ClientConfigToken ``
501+ directory.
502+
503+ Ensure the correct permissions are set:
504+
505+ .. code-block :: bash
506+
507+ sudo chmod 744 /etc/nvidia/ClientConfigToken/client_configuration_token_< datetime> .tok
508+
509+ Restart the ``nvidia-gridd `` service:
510+
511+ .. code-block :: bash
512+
513+ sudo systemctl restart nvidia-gridd
514+
515+ Check that the token has been recognised:
516+
517+ .. code-block :: bash
518+
519+ nvidia-smi -q | grep ' License Status'
520+
521+ If not, an error should appear in the journal:
522+
523+ .. code-block :: bash
524+
525+ sudo journalctl -xeu nvidia-gridd
526+
527+ A successfully licenced VM can be snapshotted to create an image in Glance that
528+ includes the drivers and licencing token. Alternatively, an image can be
529+ created using Diskimage Builder.
530+
461531Disk image builder recipe to automatically license VGPU on boot
462532^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
463533
@@ -536,6 +606,66 @@ when copying the contents as it can contain invisible characters. It is best to
536606into your openstack-config repository and vault encrypt it. The ``file `` lookup plugin can be used to decrypt
537607the file (as shown in the example above).
538608
609+ Testing vGPU VMs
610+ ^^^^^^^^^^^^^^^^
611+
612+ vGPU VMs can be validated using the following test workload. The test should
613+ succeed if the VM is correctly licenced and drivers are correctly installed for
614+ both the host and client VM.
615+
616+ Install ``cuda-toolkit `` using the instructions `here
617+ <https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html> `__.
618+
619+ Ubuntu Jammy example:
620+
621+ .. code-block :: bash
622+
623+ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
624+ sudo dpkg -i cuda-keyring_1.1-1_all.deb
625+ sudo apt update -y
626+ sudo apt install -y cuda-toolkit make
627+
628+ The VM may require a reboot at this point.
629+
630+ Clone the ``cuda-samples `` repo:
631+
632+ .. code-block :: bash
633+
634+ git clone https://github.com/NVIDIA/cuda-samples.git
635+
636+ Build and run a test workload:
637+
638+ .. code-block :: bash
639+
640+ cd cuda-samples/Samples/6_Performance/transpose
641+ make
642+ ./transpose
643+
644+ Example output:
645+
646+ .. code-block ::
647+
648+ Transpose Starting...
649+
650+ GPU Device 0: "Ampere" with compute capability 8.0
651+
652+ > Device 0: "GRID A100D-1-10C MIG 1g.10gb"
653+ > SM Capability 8.0 detected:
654+ > [GRID A100D-1-10C MIG 1g.10gb] has 14 MP(s) x 64 (Cores/MP) = 896 (Cores)
655+ > Compute performance scaling factor = 1.00
656+
657+ Matrix size: 1024x1024 (64x64 tiles), tile size: 16x16, block size: 16x16
658+
659+ transpose simple copy , Throughput = 159.1779 GB/s, Time = 0.04908 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
660+ transpose shared memory copy, Throughput = 152.1922 GB/s, Time = 0.05133 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
661+ transpose naive , Throughput = 117.2670 GB/s, Time = 0.06662 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
662+ transpose coalesced , Throughput = 135.0813 GB/s, Time = 0.05784 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
663+ transpose optimized , Throughput = 145.4326 GB/s, Time = 0.05372 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
664+ transpose coarse-grained , Throughput = 145.2941 GB/s, Time = 0.05377 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
665+ transpose fine-grained , Throughput = 150.5703 GB/s, Time = 0.05189 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
666+ transpose diagonal , Throughput = 117.6831 GB/s, Time = 0.06639 ms, Size = 1048576 fp32 elements, NumDevsUsed = 1, Workgroup = 256
667+ Test passed
668+
539669 Changing VGPU device types
540670^^^^^^^^^^^^^^^^^^^^^^^^^^
541671
0 commit comments