diff --git a/gpu-operator/gpu-operator-kubevirt.rst b/gpu-operator/gpu-operator-kubevirt.rst index af1be98d2..dfb3adf96 100644 --- a/gpu-operator/gpu-operator-kubevirt.rst +++ b/gpu-operator/gpu-operator-kubevirt.rst @@ -224,7 +224,9 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1482] Kernel modules: nvidiafb, nouveau -#. Modify the ``KubeVirt`` custom resource like the following partial example: +#. Modify the ``KubeVirt`` custom resource like the following partial example. + + Add .. code-block:: yaml @@ -235,12 +237,12 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de featureGates: - GPU - DisableMDEVConfiguration - permittedHostDevices: - pciHostDevices: + permittedHostDevices: # Defines VM devices to import. + pciHostDevices: # Include for GPU passthrough - externalResourceProvider: true pciVendorSelector: 10DE:2236 resourceName: nvidia.com/GA102GL_A10 - mediatedDevices: + mediatedDevices: # Include for vGPU - externalResourceProvider: true mdevNameSelector: NVIDIA A10-24Q resourceName: nvidia.com/NVIDIA_A10-24Q @@ -248,7 +250,11 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de Replace the values in the YAML as follows: - * ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model. + * Include ``permittedHostDevices`` for GPU passthrough. + + * Include ``mediatedDevices`` for vGPU. + + * ``pciVendorSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model. * ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type. diff --git a/openshift/openshift-virtualization.rst b/openshift/openshift-virtualization.rst index 39c633469..d5e9dfd79 100644 --- a/openshift/openshift-virtualization.rst +++ b/openshift/openshift-virtualization.rst @@ -96,6 +96,9 @@ Prerequisites hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched +* If planning to use NVIDIA vGPU, SR-IOV must be enabled in the BIOS if your GPUs are based on the NVIDIA Ampere architecture or later. Refer to the `NVIDIA vGPU Documentation `_ to ensure you have met all of the prerequisites for using NVIDIA vGPU. + + ********************************** Enabling the IOMMU driver on hosts ********************************** @@ -297,14 +300,20 @@ Create the cluster policy using the CLI: #. Modify the ``clusterpolicy.json`` file as follows: - .. note:: The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set. - * sandboxWorloads.enabled=true * vgpuManager.enabled=true * vgpuManager.repository= * vgpuManager.image=vgpu-manager * vgpuManager.version= * vgpuManager.imagePullSecrets={} + + + The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set. + + In general, the flag ``sandboxWorkloads.enabled`` in ``ClusterPolicy`` controls whether the GPU Operator can provision GPU worker nodes for virtual machine workloads, in addition to container workloads. This flag is disabled by default, meaning all nodes get provisioned with the same software which enables container workloads, and the ``nvidia.com/gpu.workload.config`` node label is not used. + + The term ``sandboxing`` refers to running software in a separate isolated environment, typically for added security (i.e. a virtual machine). We use the term ``sandbox workloads`` to signify workloads that run in a virtual machine, irrespective of the virtualization technology used. + #. Apply the changes: @@ -370,12 +379,12 @@ The following example permits the A10 GPU device and A10-24Q vGPU device. spec: featureGates: disableMDevConfiguration: true - permittedHostDevices: - pciHostDevices: + permittedHostDevices: # Defines VM devices to import. + pciHostDevices: # Include for GPU passthrough - externalResourceProvider: true pciDeviceSelector: 10DE:2236 resourceName: nvidia.com/GA102GL_A10 - mediatedDevices: + mediatedDevices: # Include for vGPU - externalResourceProvider: true mdevNameSelector: NVIDIA A10-24Q resourceName: nvidia.com/NVIDIA_A10-24Q @@ -383,6 +392,10 @@ The following example permits the A10 GPU device and A10-24Q vGPU device. Replace the values in the YAML as follows: + * Include ``permittedHostDevices`` for GPU passthrough. + + * Include ``mediatedDevices`` for vGPU. + * ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model. * ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.