Skip to content

Commit bae04dc

Browse files
authored
update openshift docs (#174)
* update vgpu and gpu pass through example in gpu/openshift docs, update openshift docs for clarity Signed-off-by: Abigail McCarthy <[email protected]>
1 parent 4c2ceac commit bae04dc

File tree

2 files changed

+29
-10
lines changed

2 files changed

+29
-10
lines changed

gpu-operator/gpu-operator-kubevirt.rst

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,9 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de
224224
Subsystem: NVIDIA Corporation GA102GL [A10] [10de:1482]
225225
Kernel modules: nvidiafb, nouveau
226226
227-
#. Modify the ``KubeVirt`` custom resource like the following partial example:
227+
#. Modify the ``KubeVirt`` custom resource like the following partial example.
228+
229+
Add
228230

229231
.. code-block:: yaml
230232
@@ -235,20 +237,24 @@ The following example shows how to permit the A10 GPU device and A10-24Q vGPU de
235237
featureGates:
236238
- GPU
237239
- DisableMDEVConfiguration
238-
permittedHostDevices:
239-
pciHostDevices:
240+
permittedHostDevices: # Defines VM devices to import.
241+
pciHostDevices: # Include for GPU passthrough
240242
- externalResourceProvider: true
241243
pciVendorSelector: 10DE:2236
242244
resourceName: nvidia.com/GA102GL_A10
243-
mediatedDevices:
245+
mediatedDevices: # Include for vGPU
244246
- externalResourceProvider: true
245247
mdevNameSelector: NVIDIA A10-24Q
246248
resourceName: nvidia.com/NVIDIA_A10-24Q
247249
...
248250
249251
Replace the values in the YAML as follows:
250252

251-
* ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.
253+
* Include ``permittedHostDevices`` for GPU passthrough.
254+
255+
* Include ``mediatedDevices`` for vGPU.
256+
257+
* ``pciVendorSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.
252258

253259
* ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.
254260

openshift/openshift-virtualization.rst

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,9 @@ Prerequisites
9696
hyperconverged.hco.kubevirt.io/kubevirt-hyperconverged patched
9797
9898
99+
* If planning to use NVIDIA vGPU, SR-IOV must be enabled in the BIOS if your GPUs are based on the NVIDIA Ampere architecture or later. Refer to the `NVIDIA vGPU Documentation <https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#prereqs-vgpu>`_ to ensure you have met all of the prerequisites for using NVIDIA vGPU.
100+
101+
99102
**********************************
100103
Enabling the IOMMU driver on hosts
101104
**********************************
@@ -297,14 +300,20 @@ Create the cluster policy using the CLI:
297300
298301
#. Modify the ``clusterpolicy.json`` file as follows:
299302

300-
.. note:: The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.
301-
302303
* sandboxWorloads.enabled=true
303304
* vgpuManager.enabled=true
304305
* vgpuManager.repository=<path to private repository>
305306
* vgpuManager.image=vgpu-manager
306307
* vgpuManager.version=<driver version>
307308
* vgpuManager.imagePullSecrets={<name of image pull secret>}
309+
310+
311+
The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.
312+
313+
In general, the flag ``sandboxWorkloads.enabled`` in ``ClusterPolicy`` controls whether the GPU Operator can provision GPU worker nodes for virtual machine workloads, in addition to container workloads. This flag is disabled by default, meaning all nodes get provisioned with the same software which enables container workloads, and the ``nvidia.com/gpu.workload.config`` node label is not used.
314+
315+
The term ``sandboxing`` refers to running software in a separate isolated environment, typically for added security (i.e. a virtual machine). We use the term ``sandbox workloads`` to signify workloads that run in a virtual machine, irrespective of the virtualization technology used.
316+
308317

309318
#. Apply the changes:
310319

@@ -370,19 +379,23 @@ The following example permits the A10 GPU device and A10-24Q vGPU device.
370379
spec:
371380
featureGates:
372381
disableMDevConfiguration: true
373-
permittedHostDevices:
374-
pciHostDevices:
382+
permittedHostDevices: # Defines VM devices to import.
383+
pciHostDevices: # Include for GPU passthrough
375384
- externalResourceProvider: true
376385
pciDeviceSelector: 10DE:2236
377386
resourceName: nvidia.com/GA102GL_A10
378-
mediatedDevices:
387+
mediatedDevices: # Include for vGPU
379388
- externalResourceProvider: true
380389
mdevNameSelector: NVIDIA A10-24Q
381390
resourceName: nvidia.com/NVIDIA_A10-24Q
382391
...
383392
384393
Replace the values in the YAML as follows:
385394

395+
* Include ``permittedHostDevices`` for GPU passthrough.
396+
397+
* Include ``mediatedDevices`` for vGPU.
398+
386399
* ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.
387400

388401
* ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.

0 commit comments

Comments
 (0)