You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* update vgpu and gpu pass through example in gpu/openshift docs, update openshift docs for clarity
Signed-off-by: Abigail McCarthy <[email protected]>
* If planning to use NVIDIA vGPU, SR-IOV must be enabled in the BIOS if your GPUs are based on the NVIDIA Ampere architecture or later. Refer to the `NVIDIA vGPU Documentation <https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#prereqs-vgpu>`_ to ensure you have met all of the prerequisites for using NVIDIA vGPU.
100
+
101
+
99
102
**********************************
100
103
Enabling the IOMMU driver on hosts
101
104
**********************************
@@ -297,14 +300,20 @@ Create the cluster policy using the CLI:
297
300
298
301
#. Modify the ``clusterpolicy.json`` file as follows:
299
302
300
-
.. note:: The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.
301
-
302
303
* sandboxWorloads.enabled=true
303
304
* vgpuManager.enabled=true
304
305
* vgpuManager.repository=<path to private repository>
305
306
* vgpuManager.image=vgpu-manager
306
307
* vgpuManager.version=<driver version>
307
308
* vgpuManager.imagePullSecrets={<name of image pull secret>}
309
+
310
+
311
+
The ``vgpuManager`` options are only required if you want to use the NVIDIA vGPU. If you are only using GPU passthrough, these options should not be set.
312
+
313
+
In general, the flag ``sandboxWorkloads.enabled`` in ``ClusterPolicy`` controls whether the GPU Operator can provision GPU worker nodes for virtual machine workloads, in addition to container workloads. This flag is disabled by default, meaning all nodes get provisioned with the same software which enables container workloads, and the ``nvidia.com/gpu.workload.config`` node label is not used.
314
+
315
+
The term ``sandboxing`` refers to running software in a separate isolated environment, typically for added security (i.e. a virtual machine). We use the term ``sandbox workloads`` to signify workloads that run in a virtual machine, irrespective of the virtualization technology used.
316
+
308
317
309
318
#. Apply the changes:
310
319
@@ -370,19 +379,23 @@ The following example permits the A10 GPU device and A10-24Q vGPU device.
370
379
spec:
371
380
featureGates:
372
381
disableMDevConfiguration: true
373
-
permittedHostDevices:
374
-
pciHostDevices:
382
+
permittedHostDevices:# Defines VM devices to import.
383
+
pciHostDevices:# Include for GPU passthrough
375
384
- externalResourceProvider: true
376
385
pciDeviceSelector: 10DE:2236
377
386
resourceName: nvidia.com/GA102GL_A10
378
-
mediatedDevices:
387
+
mediatedDevices:# Include for vGPU
379
388
- externalResourceProvider: true
380
389
mdevNameSelector: NVIDIA A10-24Q
381
390
resourceName: nvidia.com/NVIDIA_A10-24Q
382
391
...
383
392
384
393
Replace the values in the YAML as follows:
385
394
395
+
* Include ``permittedHostDevices`` for GPU passthrough.
396
+
397
+
* Include ``mediatedDevices`` for vGPU.
398
+
386
399
* ``pciDeviceSelector`` and ``resourceName`` under ``pciHostDevices`` to correspond to your GPU model.
387
400
388
401
* ``mdevNameSelector`` and ``resourceName`` under ``mediatedDevices`` to correspond to your vGPU type.
0 commit comments