Skip to content

Commit 7788e32

Browse files
authored
GPU Operator 25.3.0 release docs (#165)
* update repo for new version Signed-off-by: Abigail McCarthy <[email protected]> * update platform support, release notes Signed-off-by: Abigail McCarthy <[email protected]> * update for kernelmoduletypes change Signed-off-by: Abigail McCarthy <[email protected]> * update support matrix Signed-off-by: Abigail McCarthy <[email protected]> * minor adjustments Signed-off-by: Abigail McCarthy <[email protected]> * updates from review Signed-off-by: Abigail McCarthy <[email protected]> * updates from review Signed-off-by: Abigail McCarthy <[email protected]> * few mroe typos Signed-off-by: Abigail McCarthy <[email protected]> * add ubuntu 24.04 and remove wrong GPU Signed-off-by: Abigail McCarthy <[email protected]> * minor typos Signed-off-by: Abigail McCarthy <[email protected]> * add coming soon for dra Signed-off-by: Abigail McCarthy <[email protected]> * small typos Signed-off-by: Abigail McCarthy <[email protected]> --------- Signed-off-by: Abigail McCarthy <[email protected]>
1 parent d1a50b7 commit 7788e32

File tree

10 files changed

+248
-85
lines changed

10 files changed

+248
-85
lines changed

gpu-operator/getting-started.rst

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -173,6 +173,17 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
173173
Set this value to ``false`` when using the Operator on systems with pre-installed drivers.
174174
- ``true``
175175

176+
* - ``kernelModuleType``
177+
- Specifies the type of the NVIDIA GPU Kernel modules to use.
178+
Valid values are ``auto`` (default), ``proprietary``, and ``open``.
179+
180+
``Auto`` means that the recommended kernel module type (open or proprietary) is chosen based on the GPU devices on the host and the driver branch used.
181+
Note, ``auto`` is only supported with the 570.86.15 and 570.124.06 or later driver containers.
182+
550 and 535 branch drivers do not yet support this mode.
183+
``Open`` means the open kernel module is used.
184+
``Proprietary`` means the proprietary module is used.
185+
- ``auto``
186+
176187
* - ``driver.repository``
177188
- The images are downloaded from NGC. Specify another image repository when using
178189
custom driver images.
@@ -197,8 +208,9 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
197208
runs slowly in your cluster.
198209
- ``60s``
199210

200-
* - ``driver.useOpenKernelModules``
201-
- When set to ``true``, the driver containers install the NVIDIA Open GPU Kernel module driver.
211+
* - ``driver.useOpenKernelModules`` Deprecated.
212+
- This field is deprecated as of v25.3.0 and will be ignored. Use ``kernelModuleType`` instead.
213+
When set to ``true``, the driver containers install the NVIDIA Open GPU Kernel module driver.
202214
- ``false``
203215

204216
* - ``driver.usePrecompiled``

gpu-operator/gpu-driver-configuration.rst

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,13 @@ The following table describes some of the fields in the custom resource.
195195
- Specifies the credentials to provide to the registry if the registry is secured.
196196
- None
197197

198+
* - ``kernelModuleType``
199+
- Specifies the type of the NVIDIA GPU Kernel modules to use.
200+
Valid values are ``auto`` (default), ``proprietary``, and ``open``.
201+
202+
``Auto`` means that the recommended kernel module type is chosen based on the GPU devices on the host and the driver branch used.
203+
- ``auto``
204+
198205
* - ``labels``
199206
- Specifies a map of key and value pairs to add as custom labels to the driver pod.
200207
- None
@@ -217,8 +224,9 @@ The following table describes some of the fields in the custom resource.
217224
- Specifies the container registry that contains the driver container.
218225
- ``nvcr.io/nvidia``
219226

220-
* - ``useOpenKernelModules``
221-
- Specifies to use the NVIDIA Open GPU Kernel modules.
227+
* - ``useOpenKernelModules`` Deprecated.
228+
- This field is deprecated as of v25.3.0 and will be ignored. Use ``kernelModuleType`` instead.
229+
Specifies to use the NVIDIA Open GPU Kernel modules.
222230
- ``false``
223231

224232
* - ``tolerations``

gpu-operator/gpu-operator-rdma.rst

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,11 @@ To support GPUDirect RDMA, userspace CUDA APIs are required.
3030
The kernel mode support is provided by one of two approaches: DMA-BUF from the Linux kernel or the legacy ``nvidia-peermem`` kernel module.
3131
NVIDIA recommends using the DMA-BUF rather than using the ``nvidia-peermem`` kernel module from the GPU Driver.
3232

33-
Starting with v23.9.1 of the Operator, the Operator uses GDS driver version 2.17.5 or newer.
33+
The Operator uses GDS driver version 2.17.5 or newer.
3434
This version and higher is only supported with the NVIDIA Open GPU Kernel module driver.
35-
The sample commands for installing the Operator include the ``--set useOpenKernelModules=true``
36-
command-line argument for Helm.
35+
In GPU Operator v25.3.0 and later, the ``driver.kernelModuleType`` default is ``auto``, for the supported driver versions.
36+
This configuration allows the GPU Operator to choose the recommended driver kernel module type depending on the driver branch and the GPU devices available.
37+
Newer driver versions will use the open kernel module by default, however to make sure you are using the open kernel module, include ``--set driver.kernelModuleType=open`` command-line argument in your helm Operator install command.
3738

3839
In conjunction with the Network Operator, the GPU Operator can be used to
3940
set up the networking related components such as network device kernel drivers and Kubernetes device plugins to enable
@@ -128,7 +129,6 @@ To use DMA-BUF and network device drivers that are installed by the Network Oper
128129
-n gpu-operator --create-namespace \
129130
nvidia/gpu-operator \
130131
--version=${version} \
131-
--set driver.useOpenKernelModules=true
132132
133133
To use DMA-BUF and network device drivers that are installed on the host:
134134

@@ -138,11 +138,10 @@ To use DMA-BUF and network device drivers that are installed on the host:
138138
-n gpu-operator --create-namespace \
139139
nvidia/gpu-operator \
140140
--version=${version} \
141-
--set driver.useOpenKernelModules=true \
142141
--set driver.rdma.useHostMofed=true
143142
144143
To use the legacy ``nvidia-peermem`` kernel module instead of DMA-BUF, add ``--set driver.rdma.enabled=true`` to either of the preceding commands.
145-
The ``driver.useOpenKernelModules=true`` argument is optional for using the legacy kernel driver.
144+
Add ``--set driver.kernelModuleType=open`` if you are using a driver version from a branch earlier than R570.
146145

147146
Verifying the Installation of GPUDirect with RDMA
148147
=================================================
@@ -431,11 +430,11 @@ The following sample command applies to clusters that use the Network Operator t
431430
-n gpu-operator --create-namespace \
432431
nvidia/gpu-operator \
433432
--version=${version} \
434-
--set driver.useOpenKernelModules=true \
435433
--set gds.enabled=true
436434
437435
Add ``--set driver.rdma.enabled=true`` to the command to use the legacy ``nvidia-peermem`` kernel module.
438436

437+
Add ``--set driver.kernelModuleType=open`` if you are using a driver version from a branch earlier than R570.
439438

440439
Verification
441440
==============

gpu-operator/life-cycle-policy.rst

Lines changed: 18 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,13 @@ The product life cycle and versioning are subject to change in the future.
5555
* - GPU Operator Version
5656
- Status
5757

58-
* - 24.9.x
58+
* - 25.3.x
5959
- Generally Available
6060

61-
* - 24.6.x
61+
* - 24.9.x
6262
- Maintenance
6363

64-
* - 24.3.x and lower
64+
* - 24.6.x and lower
6565
- EOL
6666

6767

@@ -89,60 +89,55 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
8989
- ${version}
9090

9191
* - NVIDIA GPU Driver
92-
- | `570.86.15 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-86-15/index.html>`_ (recommended),
93-
| `565.57.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-565-57-01/index.html>`_
94-
| `560.35.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-560-35-03/index.html>`_
95-
| `550.144.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-144-03/index.html>`_ (default),
96-
| `550.127.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-127-08/index.html>`_
97-
| `535.230.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-230-02/index.html>`_
98-
| `535.216.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-216-03/index.html>`_
92+
- | `570.124.06 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-124-06/index.html>`_ (default, recommended),
93+
| `570.86.15 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-86-15/index.html>`_
9994
10095
* - NVIDIA Driver Manager for Kubernetes
101-
- `v0.7.0 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`__
96+
- `v0.8.0 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`__
10297

10398
* - NVIDIA Container Toolkit
104-
- `1.17.4 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`__
99+
- `1.17.5 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`__
105100

106101
* - NVIDIA Kubernetes Device Plugin
107-
- `0.17.0 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
102+
- `0.17.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
108103

109104
* - DCGM Exporter
110-
- `3.3.9-3.6.1 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
105+
- `4.1.1-4.0.4 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
111106

112107
* - Node Feature Discovery
113-
- v0.16.6
108+
- `v0.17.2 <https://github.com/kubernetes-sigs/node-feature-discovery/releases/>`__
114109

115110
* - | NVIDIA GPU Feature Discovery
116111
| for Kubernetes
117-
- `0.17.0 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
112+
- `0.17.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
118113

119114
* - NVIDIA MIG Manager for Kubernetes
120-
- `0.10.0 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`__
115+
- `0.12.1 <https://github.com/NVIDIA/mig-parted/tree/main/deployments/gpu-operator>`__
121116

122117
* - DCGM
123-
- `3.3.9-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
118+
- `4.1.1-2 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
124119

125120
* - Validator for NVIDIA GPU Operator
126121
- ${version}
127122

128123
* - NVIDIA KubeVirt GPU Device Plugin
129-
- `v1.2.10 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`__
124+
- `v1.3.1 <https://github.com/NVIDIA/kubevirt-gpu-device-plugin>`__
130125

131126
* - NVIDIA vGPU Device Manager
132-
- `v0.2.8 <https://github.com/NVIDIA/vgpu-device-manager>`__
127+
- `v0.3.0 <https://github.com/NVIDIA/vgpu-device-manager>`__
133128

134129
* - NVIDIA GDS Driver |gds|_
135130
- `2.20.5 <https://github.com/NVIDIA/gds-nvidia-fs/releases>`__
136131

137132
* - NVIDIA Kata Manager for Kubernetes
138-
- `v0.2.2 <https://github.com/NVIDIA/k8s-kata-manager>`__
133+
- `v0.2.3 <https://github.com/NVIDIA/k8s-kata-manager>`__
139134

140135
* - | NVIDIA Confidential Computing
141136
| Manager for Kubernetes
142137
- v0.1.1
143138

144139
* - NVIDIA GDRCopy Driver
145-
- `v2.4.1-1 <https://github.com/NVIDIA/gdrcopy/releases>`__
140+
- `v2.4.4 <https://github.com/NVIDIA/gdrcopy/releases>`__
146141

147142
.. _gds-open-kernel:
148143

@@ -156,4 +151,4 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
156151
version downloaded from the `NVIDIA vGPU Software Portal <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
157152
- The GPU Operator is supported on all active NVIDIA data center production drivers.
158153
Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#cuda-drivers>`_
159-
for more information.
154+
for more information.

gpu-operator/manifests/input/nvd-demo-gold.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ spec:
1616
image: driver
1717
imagePullPolicy: IfNotPresent
1818
imagePullSecrets: []
19+
kernelModuleType: auto
1920
manager: {}
2021
nodeSelector:
2122
driver.config: "gold"
@@ -30,6 +31,5 @@ spec:
3031
initialDelaySeconds: 60
3132
periodSeconds: 10
3233
timeoutSeconds: 60
33-
useOpenKernelModules: false
3434
usePrecompiled: false
3535
version: 535.104.12

0 commit comments

Comments
 (0)