You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
small updates
add additional components
Apply suggestions from code review
add in known issues and component updates
Signed-off-by: Abigail McCarthy <[email protected]>
@@ -169,4 +182,5 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
169
182
version downloaded from the `NVIDIA Licensing Portal <https://ui.licensing.nvidia.com>`_.
170
183
- The GPU Operator is supported on all active NVIDIA data center production drivers.
171
184
Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#supported-drivers-and-cuda-toolkit-versions>`_
Copy file name to clipboardExpand all lines: gpu-operator/release-notes.rst
+50Lines changed: 50 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,6 +33,56 @@ Refer to the :ref:`GPU Operator Component Matrix` for a list of software compone
33
33
34
34
----
35
35
36
+
37
+
38
+
.. _v25.10.1:
39
+
40
+
25.10.1
41
+
=======
42
+
43
+
New Features
44
+
------------
45
+
46
+
* Updated software component versions:
47
+
48
+
- NVIDIA Container Toolkit v1.18.1
49
+
- NVIDIA DCGM v4.4.2-1
50
+
- NVIDIA DCGM Exporter v4.4.2-4.7.0
51
+
- NVIDIA Kubernetes Device Plugin v0.18.1
52
+
- NVIDIA GPU Feature Discovery v0.18.1
53
+
- NVIDIA MIG Manager for Kubernetes 0.13.1
54
+
- NVIDIA Driver Manager for Kubernetes v0.9.1
55
+
56
+
* Added support for this NVIDIA Data Center GPU Driver version:
57
+
58
+
- 580.105.08 (default)
59
+
60
+
* Add HPC job mapping support to DCGM Exporter to collect metrics for HPC jobs running on the cluster.
61
+
62
+
Configure the HPC job mapping by setting the ``dcgmExporter.hpcJobMapping.enabled`` field to ``true`` in the ClusterPolicy custom resource.
63
+
Set ``dcgmExporter.hpcJobMapping.directory`` with the directory path where HPC job mapping files are created by the workload manager.
64
+
The default directory is ``/var/lib/dcgm-exporter/job-mapping``.
65
+
66
+
* Improved the cluster policy reconciler to be more resilient to race conditions during node updates.
67
+
68
+
Fixed Issues
69
+
------------
70
+
71
+
* Fixed the following known issue introduced in GPU Operator v25.10.0:
72
+
73
+
* When using cri-o as the container runtime, several GPU Operator pods can be stuck in the ``Init:RunContainerError`` or ``Init:CreateContainerError`` state during GPU Operator installation or upgrade, or during GPU driver daemonset upgrade.
74
+
* NVIDIA Container Toolkit 1.18.0 overwrites the imports field in the top-level containerd configuration file, so any previously imported paths are lost.
75
+
This was fixed in NVIDIA Container Toolkit v1.18.1.
76
+
77
+
* Fixed a race condition where user-supplied NVIDIA kernel module parameters were sometimes not being applied by the driver daemonset.
78
+
For more information, refer to `PR #1939 <https://github.com/NVIDIA/gpu-operator/pull/1939>`__.
79
+
80
+
* Fixed a bug where driver images were being incorrectly assigned in multi-nodepool clusters.
81
+
For more information, refer to `Issue #1622 <https://github.com/NVIDIA/gpu-operator/issues/1622>`__.
82
+
* Fixed a bug where the GPU Operator Helm chart template was not assigning the correct namespace to resources it created.
83
+
* Fixed a bug where the k8s-driver-manager would wait indefinitely when MOFED is enabled and ``USE_HOST_MOFED`` is set to true despite the MOFED being pre-installed on the host.
0 commit comments