You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: gpu-operator/dra-cds.rst
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,13 +18,13 @@ Motivation
18
18
NVIDIA's `GB200 NVL72 <https://www.nvidia.com/en-us/data-center/gb200-nvl72/>`_ and comparable systems are designed specifically around Multi-Node NVLink (`MNNVL <https://docs.nvidia.com/multi-node-nvlink-systems/mnnvl-user-guide/overview.html>`_) to turn a rack of GPU machines -- each with a small number of GPUs -- into a supercomputer with a large number of GPUs communicating at high bandwidth (1.8 TB/s chip-to-chip, and over `130 TB/s cumulative bandwidth <https://docs.nvidia.com/multi-node-nvlink-systems/multi-node-tuning-guide/overview.html#fifth-generation-nvlink>`_ on a GB200 NVL72).
19
19
20
20
NVIDIA's DRA Driver for GPUs enables MNNVL for Kubernetes workloads by introducing a new concept -- the **ComputeDomain**:
21
-
when workload requests a ComputeDomain, NVIDIA's DRA Driver for GPUs performs all the heavy lifting required for sharing GPU memory **securely** via NVLink among all pods that comprise the workload.
21
+
when a workload requests a ComputeDomain, NVIDIA's DRA Driver for GPUs performs all the heavy lifting required for sharing GPU memory **securely** via NVLink among all pods that comprise the workload.
22
22
23
23
.. note::
24
24
25
25
Users may appreciate to know that -- under the hood -- NVIDIA Internode Memory Exchange (`IMEX <https://docs.nvidia.com/multi-node-nvlink-systems/mnnvl-user-guide/overview.html#internode-memory-exchange-service>`_) primitives need to be orchestrated for mapping GPU memory over NVLink *securely*: IMEX provides an access control system to lock down GPU memory even between GPUs on the same NVLink partition.
26
26
27
-
A design goal of this DRA driver is to make IMEX, as much as possible, an implementation detail that workload authors and cluster operators do not need to be concerned with: the driver launches and/or reconfigures IMEX daemons and establishes and injects IMEX channels into containers as needed.
27
+
A design goal of this DRA driver is to make IMEX, as much as possible, an implementation detail that workload authors and cluster operators do not need to be concerned with: the driver launches and/or reconfigures IMEX daemons and establishes and injects `IMEX channels<https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/imexchannels.html>`_ into containers as needed.
Copy file name to clipboardExpand all lines: gpu-operator/dra-intro-install.rst
+15-5Lines changed: 15 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -49,8 +49,8 @@ Prerequisites
49
49
50
50
- Kubernetes v1.32 or newer.
51
51
- DRA and corresponding API groups must be enabled (`see Kubernetes docs <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#enabling-dynamic-resource-allocation>`_).
52
-
- GPU Driver 565 or later.
53
-
- NVIDIA's GPU Operator v25.3.0 or later, installed with CDI enabled (use the ``--set cdi.enabled=true`` commandline argument during ``helm install``). For reference, please refer to the GPU Operator `installation documentation <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options>`__.
52
+
- NVIDIA GPU Driver 565 or later.
53
+
- While not strictly required, we recommend using NVIDIA's GPU Operator v25.3.0 or later, installed with CDI enabled (use the ``--set cdi.enabled=true`` commandline argument during ``helm install``). For reference, please refer to the GPU Operator `installation documentation <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options>`__.
54
54
55
55
..
56
56
For convenience, the following example shows how to enable CDI upon GPU Operator installation:
@@ -80,15 +80,25 @@ Configure and Helm-install the driver
0 commit comments