Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 11 additions & 5 deletions gpu-operator/gpu-operator-kubevirt.rst
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,9 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de
Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1482]
Kernel modules: nvidiafb, nouveau

#. Modify the ``KubeVirt`` custom resource like the following partial example:
#. Modify the ``KubeVirt`` custom resource like the following partial example.

Add

.. code-block:: yaml

Expand All @@ -235,20 +237,24 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de
featureGates:
- GPU
- DisableMDEVConfiguration
permittedHostDevices:
pciHostDevices:
permittedHostDevices: # Defines VM devices to import.
pciHostDevices: # Include for GPU passthrough
- externalResourceProvider: true
pciVendorSelector: 10DE:2236
resourceName: nvidia.com/GA102GL_A10
mediatedDevices:
mediatedDevices: # Include for vGPU
- externalResourceProvider: true
mdevNameSelector: NVIDIA A10-24Q
resourceName: nvidia.com/NVIDIA_A10-24Q
...

Replace the values in the YAML as follows:

* ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.
* Include ``permittedHostDevices`` for GPU passthrough.

* Include ``mediatedDevices`` for vGPU.

* ``pciVendorSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.

* ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.

Expand Down
23 changes: 18 additions & 5 deletions openshift/openshift-virtualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ Prerequisites
hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched


* If planning to use NVIDIA vGPU, SR-IOV must be enabled in the BIOS if your GPUs are based on the NVIDIA Ampere architecture or later. Refer to the `NVIDIA vGPU Documentation <https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#prereqs-vgpu>`_ to ensure you have met all of the prerequisites for using NVIDIA vGPU.


**********************************
Enabling the IOMMU driver on hosts
**********************************
Expand Down Expand Up @@ -297,14 +300,20 @@ Create the cluster policy using the CLI:

#. Modify the ``clusterpolicy.json`` file as follows:

.. note:: The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.

* sandboxWorloads.enabled=true
* vgpuManager.enabled=true
* vgpuManager.repository=<path to private repository>
* vgpuManager.image=vgpu-manager
* vgpuManager.version=<driver version>
* vgpuManager.imagePullSecrets={<name of image pull secret>}


The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.

In general, the flag ``sandboxWorkloads.enabled`` in ``ClusterPolicy`` controls whether the GPU Operator can provision GPU worker nodes for virtual machine workloads, in addition to container workloads. This flag is disabled by default, meaning all nodes get provisioned with the same software which enables container workloads, and the ``nvidia.com/gpu.workload.config`` node label is not used.

The term ``sandboxing`` refers to running software in a separate isolated environment, typically for added security (i.e. a virtual machine). We use the term ``sandbox workloads`` to signify workloads that run in a virtual machine, irrespective of the virtualization technology used.


#. Apply the changes:

Expand Down Expand Up @@ -370,19 +379,23 @@ The following example permits the A10 GPU device and A10-24Q vGPU device.
spec:
featureGates:
disableMDevConfiguration: true
permittedHostDevices:
pciHostDevices:
permittedHostDevices: # Defines VM devices to import.
pciHostDevices: # Include for GPU passthrough
- externalResourceProvider: true
pciDeviceSelector: 10DE:2236
resourceName: nvidia.com/GA102GL_A10
mediatedDevices:
mediatedDevices: # Include for vGPU
- externalResourceProvider: true
mdevNameSelector: NVIDIA A10-24Q
resourceName: nvidia.com/NVIDIA_A10-24Q
...

Replace the values in the YAML as follows:

* Include ``permittedHostDevices`` for GPU passthrough.

* Include ``mediatedDevices`` for vGPU.

* ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.

* ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.
Expand Down