Update docs for 25.10.1 (#326)

a-mccarthy · web-flow · commit 41120aec9b93 · 2025-12-04T12:24:15.000-05:00
small updates



add additional components



Apply suggestions from code review



add in known issues and component updates

Signed-off-by: Abigail McCarthy &lt;20771501+a-mccarthy@users.noreply.github.com&gt;
diff --git a/gpu-operator/life-cycle-policy.rst b/gpu-operator/life-cycle-policy.rst
@@ -87,9 +87,10 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
    :header-rows: 2
 
    * - :rspan:`1` Component
-     - GPU Operator Version
+     - :cspan:`2` GPU Operator Version
 
    * - v25.10.0
+     - v25.10.1
 
    * - NVIDIA GPU Driver |ki|_
      - | `580.95.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-95-05/index.html>`_ (**D**, **R**)
@@ -98,32 +99,44 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
        | `570.195.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-195-03/index.html>`_
        | `550.163.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-163-01/index.html>`_
        | `535.274.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-274-03/index.html>`_
+     - | `580.105.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-105-08/index.html>`_ (**D**, **R**)
+       | `580.95.05 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-95-05/index.html>`_ 
+       | `580.82.07 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-580-82-07/index.html>`_ 
+       | `575.57.08 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-575-57-08/index.html>`_
+       | `570.195.03 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-570-195-03/index.html>`_
+       | `550.163.01 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-550-163-01/index.html>`_
+       | `535.274.02 <https://docs.nvidia.com/datacenter/tesla/tesla-release-notes-535-274-03/index.html>`_
 
 
    * - NVIDIA Driver Manager for Kubernetes
      - `v0.9.0 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`__
+     - `v0.9.1 <https://ngc.nvidia.com/catalog/containers/nvidia:cloud-native:k8s-driver-manager>`__
 
    * - NVIDIA Container Toolkit
      - `1.18.0 <https://github.com/NVIDIA/nvidia-container-toolkit/releases>`__
 
    * - NVIDIA Kubernetes Device Plugin
      - `0.18.0 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
+     - `0.18.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
 
    * - DCGM Exporter
      - `v4.4.1-4.6.0 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
+     - `v4.4.2-4.7.0 <https://github.com/NVIDIA/dcgm-exporter/releases>`__
 
    * - Node Feature Discovery
      - `v0.18.2 <https://github.com/kubernetes-sigs/node-feature-discovery/releases/>`__
 
    * - | NVIDIA GPU Feature Discovery
        | for Kubernetes
-     - `0.18.0 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
+     - `0.18.1 <https://github.com/NVIDIA/k8s-device-plugin/releases>`__
 
    * - NVIDIA MIG Manager for Kubernetes
      - `0.13.0 <https://github.com/NVIDIA/mig-parted/blob/main/CHANGELOG.md>`__
+     - `0.13.1 <https://github.com/NVIDIA/mig-parted/blob/main/CHANGELOG.md>`__
 
    * - DCGM
      - `4.4.1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
+     - `4.4.2-1 <https://docs.nvidia.com/datacenter/dcgm/latest/release-notes/changelog.html>`__
 
    * - Validator for NVIDIA GPU Operator
      - v25.10.0
@@ -169,4 +182,5 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
      version downloaded from the `NVIDIA Licensing Portal  <https://ui.licensing.nvidia.com>`_.
    - The GPU Operator is supported on all active NVIDIA data center production drivers.
      Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#supported-drivers-and-cuda-toolkit-versions>`_
-     for more information.
+     for more information.
+
diff --git a/gpu-operator/release-notes.rst b/gpu-operator/release-notes.rst
@@ -33,6 +33,56 @@ Refer to the :ref:`GPU Operator Component Matrix` for a list of software compone
 
 ----
 
+
+
+.. _v25.10.1:
+
+25.10.1
+=======
+
+New Features
+------------
+
+* Updated software component versions:
+
+  - NVIDIA Container Toolkit v1.18.1
+  - NVIDIA DCGM v4.4.2-1
+  - NVIDIA DCGM Exporter v4.4.2-4.7.0 
+  - NVIDIA Kubernetes Device Plugin v0.18.1
+  - NVIDIA GPU Feature Discovery v0.18.1
+  - NVIDIA MIG Manager for Kubernetes 0.13.1
+  - NVIDIA Driver Manager for Kubernetes v0.9.1
+
+* Added support for this NVIDIA Data Center GPU Driver version:
+
+  - 580.105.08 (default)
+
+* Add HPC job mapping support to DCGM Exporter to collect metrics for HPC jobs running on the cluster.
+
+  Configure the HPC job mapping by setting the ``dcgmExporter.hpcJobMapping.enabled`` field to ``true`` in the ClusterPolicy custom resource.
+  Set ``dcgmExporter.hpcJobMapping.directory`` with the directory path where HPC job mapping files are created by the workload manager. 
+  The default directory is ``/var/lib/dcgm-exporter/job-mapping``.
+
+* Improved the cluster policy reconciler to be more resilient to race conditions during node updates.
+
+Fixed Issues
+------------
+
+* Fixed the following known issue introduced in GPU Operator v25.10.0:
+
+  * When using cri-o as the container runtime, several GPU Operator pods can be stuck in the ``Init:RunContainerError`` or ``Init:CreateContainerError`` state during GPU Operator installation or upgrade, or during GPU driver daemonset upgrade.
+  * NVIDIA Container Toolkit 1.18.0 overwrites the imports field in the top-level containerd configuration file, so any previously imported paths are lost.
+    This was fixed in NVIDIA Container Toolkit v1.18.1.
+
+* Fixed a race condition where user-supplied NVIDIA kernel module parameters were sometimes not being applied by the driver daemonset. 
+  For more information, refer to `PR #1939 <https://github.com/NVIDIA/gpu-operator/pull/1939>`__.
+
+* Fixed a bug where driver images were being incorrectly assigned in multi-nodepool clusters. 
+  For more information, refer to `Issue #1622 <https://github.com/NVIDIA/gpu-operator/issues/1622>`__.
+* Fixed a bug where the GPU Operator Helm chart template was not assigning the correct namespace to resources it created.
+* Fixed a bug where the k8s-driver-manager would wait indefinitely when MOFED is enabled and ``USE_HOST_MOFED`` is set to true despite the MOFED being pre-installed on the host. 
+
+
 .. _v25.10.0:
 
 25.10.0
diff --git a/repo.toml b/repo.toml
@@ -168,7 +168,7 @@ docs_root = "${root}/gpu-operator"
 project = "gpu-operator"
 name = "NVIDIA GPU Operator"
 version = "25.10"  # Update repo_docs.projects.openshift.version to match latest patch version maj.min.patch
-source_substitutions = { minor_version = "25.10", version = "v25.10.0", recommended = "580.95.05" }
+source_substitutions = { minor_version = "25.10", version = "v25.10.1", recommended = "580.105.08" }
 copyright_start = 2020
 sphinx_exclude_patterns = [
   "life-cycle-policy.rst",