Add driver container image tag note for OCP 4.19+ (#245)

chenopis · web-flow · commit cc01e96b702c · 2025-09-18T11:29:30.000-07:00
* add driver container image tag note for OCP 4.19+

Signed-off-by: Andrew Chen &lt;andrewch@nvidia.com&gt;

* formatting and style guide fixes

Signed-off-by: Andrew Chen &lt;andrewch@nvidia.com&gt;

* add reference links to RH docs

Signed-off-by: Andrew Chen &lt;andrewch@nvidia.com&gt;

* accept suggestions

Signed-off-by: Andrew Chen &lt;andrewch@nvidia.com&gt;

---------

Signed-off-by: Andrew Chen &lt;andrewch@nvidia.com&gt;
diff --git a/openshift/appendix-ocp.rst b/openshift/appendix-ocp.rst
@@ -63,5 +63,5 @@ For additional troubleshooting resources:
 * `Node Feature Discovery documentation <https://kubernetes-sigs.github.io/node-feature-discovery/>`_.
 * `Red Hat Node Feature Discovery Operator documentation <https://docs.openshift.com/container-platform/latest/hardware_enablement/psap-node-feature-discovery-operator.html>`_
 * `OpenShift Driver Toolkit documentation <https://docs.redhat.com/en/documentation/openshift_container_platform/latest/html/specialized_hardware_and_driver_enablement/driver-toolkit>`_
-* `OpenShift Driver Toolkit GihHub repository <https://github.com/openshift/driver-toolkit/>`_
+* `OpenShift Driver Toolkit GitHub repository <https://github.com/openshift/driver-toolkit/>`_
 * `OpenShift troubleshooting guide <https://docs.openshift.com/container-platform/latest/support/troubleshooting/>`_
diff --git a/openshift/gpu-operator-with-precompiled-drivers.rst b/openshift/gpu-operator-with-precompiled-drivers.rst
@@ -17,7 +17,7 @@ About Precompiled Driver Containers
 ***********************************
 
 By default, NVIDIA GPU drivers are built on the cluster nodes when you deploy the GPU Operator.
-Driver compilation and packaging is done on every Kubernetes node, which leads to bursts of compute demand, waste of resources, and long provisioning times.
+Driver compilation and packaging is done on every Kubernetes node, leading to bursts of compute demand, waste of resources, and long provisioning times.
 In contrast, using container images with precompiled drivers makes the drivers immediately available on all nodes, resulting in faster provisioning and cost savings in public cloud deployments.
 
 ***********************************
@@ -43,19 +43,19 @@ Perform the following steps to build a custom driver image for use with Red Hat
 
 .. rubric:: Prerequisites
 
-* You have access to a container registry, such as NVIDIA NGC Private Registry, Red Hat Quay, or the OpenShift internal container registry, and can push container images to the registry.
+* You have access to a container registry such as NVIDIA NGC Private Registry, Red Hat Quay, or the OpenShift internal container registry and can push container images to the registry.
 
 * You have a valid Red Hat subscription with an activation key.
 
 * You have a Red Hat OpenShift pull secret.
 
 * Your build machine has access to the internet to download operating system packages.
 
-* You know a CUDA version, such as ``12.1.0``, that you want to use.
+* You know a CUDA version such as ``12.1.0`` that you want to use.
 
-  One way to find a supported CUDA version for your operating system is to access the NVIDIA GPU Cloud registry at `CUDA | NVIDIA NGC <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags>`_ and view the tags. Use the search field to filter the tags, such as ``base-ubi8`` for RHEL 8   and ``base-ubi9`` for RHEL 9. The filtered results show the CUDA versions, such as ``12.1.0``, ``12.0.1``, ``12.0.0``, and so on.
+  One way to find a supported CUDA version for your operating system is to access the NVIDIA GPU Cloud registry at `CUDA | NVIDIA NGC <https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags>`_ and view the tags. Use the search field to filter the tags such as ``base-ubi8`` for RHEL 8 and ``base-ubi9`` for RHEL 9. The filtered results show the CUDA versions such as ``12.1.0``, ``12.0.1``, and ``12.0.0``.
 
-* You know the GPU driver version, such as ``525.105.17``, that you want to use.
+* You know the GPU driver version such as ``525.105.17`` that you want to use.
 
 .. rubric:: Procedure
 
@@ -65,26 +65,26 @@ Perform the following steps to build a custom driver image for use with Red Hat
 
       $ git clone https://gitlab.com/nvidia/container-images/driver
 
-#. Change the directory to ``rhel8/precompiled`` under the cloned repository. You can build precompiled driver images for versions 8 and 9 of RHEL from this directory:
+#. Change to the ``rhel8/precompiled`` directory under the cloned repository. You can build precompiled driver images for versions 8 and 9 of RHEL from this directory:
 
    .. code-block:: console
 
       $ cd driver/rhel8/precompiled
 
-#. Create a Red Hat Customer Portal Activation Key and note your Red Hat Subscription Management (RHSM) organization ID. These are to install packages during a build. Save the values to files, for example, ``$HOME/rhsm_org`` and ``$HOME/rhsm_activationkey``:
+#. Create a Red Hat Customer Portal Activation Key and note your Red Hat Subscription Management (RHSM) organization ID. These are to install packages during a build. Save the values to files such as ``$HOME/rhsm_org`` and ``$HOME/rhsm_activationkey``:
 
    .. code-block:: console
 
       export RHSM_ORG_FILE=$HOME/rhsm_org
       export RHSM_ACTIVATIONKEY_FILE=$HOME/rhsm_activationkey
 
-#. Download your Red Hat OpenShift pull secret and store it in a file, for example, ``${HOME}/pull-secret``:
+#. Download your Red Hat OpenShift pull secret and store it in a file such as ``${HOME}/pull-secret``:
 
    .. code-block:: console
 
       export PULL_SECRET_FILE=$HOME/pull-secret.txt
 
-#. Set the Red Hat OpenShift version and target architecture of your cluster, for example, ``x86_64``:
+#. Set the Red Hat OpenShift version and target architecture of your cluster such as ``x86_64``:
 
    .. code-block:: console
 
@@ -121,15 +121,24 @@ Perform the following steps to build a custom driver image for use with Red Hat
       export DRIVER_VERSION=525.105.17
       export OS_TAG=rhcos4.12
 
+   .. note:: The driver container image tag for OpenShift has changed after the OCP 4.19 release.
+
+      - Before OCP 4.19: The driver image tag is formed with the suffix ``-rhcos4.17`` (such as with OCP 4.17).
+      - Starting OCP 4.19 and later: The driver image tag is formed with the suffix ``-rhel9.6`` (such as with OCP 4.19).
+
+      Refer to `RHEL Versions Utilized by RHEL CoreOS and OCP <https://access.redhat.com/articles/6907891>`_
+      and `Split RHCOS into layers: /etc/os-release <https://github.com/openshift/enhancements/blob/master/enhancements/rhcos/split-rhcos-into-layers.md#etcos-release>`_
+      for more information.
+
 #. Build and push the image:
 
    .. code-block:: console
 
       make image image-push
 
-Optionally, override the ``IMAGE_REGISTRY``, ``IMAGE_NAME``, and ``CONTAINER_TOOL``. You can also override ``BUILDER_USER`` and ``BUILDER_EMAIL`` if you want, otherwise your Git username and email are used. See the Makefile for all available variables.
+Optionally, override the ``IMAGE_REGISTRY``, ``IMAGE_NAME``, and ``CONTAINER_TOOL``. You can also override ``BUILDER_USER`` and ``BUILDER_EMAIL`` if you want. Otherwise, your Git username and email are used. Refer to the Makefile for all available variables.
 
-.. note:: Do not set the ``DRIVER_TYPE``. The only supported value is currently ``passthrough``, which is set by default.
+.. note:: Do not set the ``DRIVER_TYPE``. The only supported value is currently ``passthrough``, and this is set by default.
 
 *********************************************
 Enabling Precompiled Driver Container Support
diff --git a/openshift/install-gpu-ocp.rst b/openshift/install-gpu-ocp.rst
@@ -13,18 +13,19 @@ Installing the NVIDIA GPU Operator by using the web console
 
 #. In the OpenShift Container Platform web console, from the side menu, navigate to **Operators** > **OperatorHub** and select **All Projects**.
 
+#. In **Operators** > **OperatorHub**, search for the **NVIDIA GPU Operator**. For additional information, refer to the `Red Hat OpenShift Container Platform documentation <https://docs.openshift.com/container-platform/latest/operators/admin/olm-adding-operators-to-cluster.html>`_.
 #. In **Operators** > **OperatorHub**, search for the **NVIDIA GPU Operator**. For additional information, refer to the `Red Hat OpenShift Container Platform documentation <https://docs.openshift.com/container-platform/latest/operators/admin/olm-adding-operators-to-cluster.html>`_.
 
 #. Select the **NVIDIA GPU Operator**, click **Install**. In the following screen, click **Install**.
 
    .. note:: Here, you can select the namespace where you want to deploy the GPU Operator. The suggested namespace to use is the ``nvidia-gpu-operator``. You can choose any existing namespace or create a new namespace under **Select a Namespace**.
 
-     If you install in any other namespace other than ``nvidia-gpu-operator``, the GPU Operator will **not** automatically enable namespace monitoring, and metrics and alerts will **not** be collected by Prometheus.
-     If only trusted operators are installed in this namespace, you can manually enable namespace monitoring with this command:
+      If you install in any other namespace other than ``nvidia-gpu-operator``, the GPU Operator does **not** automatically enable namespace monitoring, and metrics and alerts are **not** collected by Prometheus.
+      If only trusted operators are installed in this namespace, you can manually enable namespace monitoring with this command:
 
-     .. code-block:: console
+      .. code-block:: console
 
-        $ oc label ns/$NAMESPACE_NAME openshift.io/cluster-monitoring=true
+         $ oc label ns/$NAMESPACE_NAME openshift.io/cluster-monitoring=true
 
 Proceed to :ref:`Create the cluster policy for the NVIDIA GPU Operator <create-cluster-policy>`.
 
@@ -198,7 +199,7 @@ When you install the **NVIDIA GPU Operator** in the OpenShift Container Platform
 .. note:: If you create a ClusterPolicy that contains an empty specification such as ``spec{}``, the ClusterPolicy fails to deploy.
 
 As a cluster administrator, you can create a ClusterPolicy using the OpenShift Container Platform CLI or the web console. Also, these steps differ
-when using **NVIDIA vGPU**. Refer to the appropriate sections that follow.
+when using **NVIDIA vGPU**. Refer to the appropriate sections below.
 
 .. _create-cluster-policy-web-console:
 
@@ -209,7 +210,7 @@ Create the cluster policy using the web console
 
 #. Select the **ClusterPolicy** tab, then click **Create ClusterPolicy**. The platform assigns the default name *gpu-cluster-policy*.
 
-   .. note:: You can use this screen to customize the ClusterPolicy; although, the default values are sufficient to get the GPU configured and running in most cases.
+      .. note:: You can use this screen to customize the ClusterPolicy. However, the default values are sufficient to get the GPU configured and running in most cases.
 
    .. note:: For OpenShift 4.12 with GPU Operator 25.3.1 or later, you must expand the **Driver** section and set the following fields:
 
@@ -219,7 +220,7 @@ Create the cluster policy using the web console
 
 #. Click **Create**.
 
-   At this point, the GPU Operator proceeds and installs all the required components to set up the NVIDIA GPUs in the OpenShift 4 cluster. Wait at least 10-20 minutes before digging deeper into any form of troubleshooting because this may take a period of time to finish.
+   At this point, the GPU Operator proceeds and installs all the required components to set up the NVIDIA GPUs in the OpenShift 4 cluster. Wait at least 10 to 20 minutes before troubleshooting because this process can take some time to finish.
 
 #. The status of the newly deployed ClusterPolicy *gpu-cluster-policy* for the NVIDIA GPU Operator changes to ``State:ready`` when the installation succeeds.
 
@@ -237,7 +238,7 @@ Create the cluster policy using the CLI
       $ oc get csv -n nvidia-gpu-operator gpu-operator-certified.v22.9.0 -ojsonpath={.metadata.annotations.alm-examples} | jq .[0] > clusterpolicy.json
 
 
-   .. note:: For OpenShift 4.12 with GPU Operator 25.3.1 or later, modify the clusterpolicy.json file to specify ``driver.licensingConfig``, ``driver.repository``, ``driver.image``, ``driver.version``, and ``driver.imagePullSecrets`` (optional). The following snippet is shown as an example. Change values accordingly. Refer to :ref:`operator-release-notes` for recommended driver versions.
+   .. note:: For OpenShift 4.12 with GPU Operator 25.3.1 or later, modify the ``clusterpolicy.json`` file to specify ``driver.licensingConfig``, ``driver.repository``, ``driver.image``, ``driver.version``, and ``driver.imagePullSecrets`` (optional). The following snippet is shown as an example. Change values accordingly. Refer to :ref:`operator-release-notes` for recommended driver versions.
 
    .. code-block:: json
 
@@ -275,13 +276,13 @@ Create the cluster policy using the web console
 
    .. image:: graphics/cluster_policy_vgpu_1.png
 
-#. Specify ``repository`` path, ``image`` name and NVIDIA vGPU driver ``version`` bundled under **Driver** section. If the registry is not public, please specify the ``imagePullSecret`` created during pre-requisite step under **Driver** advanced configurations section.
+#. Specify the ``repository`` path, ``image`` name, and NVIDIA vGPU driver ``version`` bundled under the **Driver** section. If the registry is not public, specify the ``imagePullSecret`` created during the prerequisite step under the **Driver** advanced configurations section.
 
    .. image:: graphics/cluster_policy_vgpu_2.png
 
 #. Click **Create**.
 
-   At this point, the GPU Operator proceeds and installs all the required components to set up the NVIDIA GPUs in the OpenShift 4 cluster. Wait at least 10-20 minutes before digging deeper into any form of troubleshooting because this may take a period of time to finish.
+   At this point, the GPU Operator proceeds and installs all the required components to set up the NVIDIA GPUs in the OpenShift 4 cluster. Wait at least 10 to 20 minutes before troubleshooting because this process can take some time to finish.
 
 #. The status of the newly deployed ClusterPolicy *gpu-cluster-policy* for the NVIDIA GPU Operator changes to ``State:ready`` when the installation succeeds.
 
@@ -297,7 +298,7 @@ Create the cluster policy using the CLI
 
       $ oc get csv -n nvidia-gpu-operator gpu-operator-certified.v22.9.0 -ojsonpath={.metadata.annotations.alm-examples} | jq .[0] > clusterpolicy.json
 
-   Modify clusterpolicy.json file to specify ``driver.licensingConfig``, ``driver.repository``, ``driver.image``, ``driver.version`` and ``driver.imagePullSecrets`` created during pre-requiste steps. Below snippet is shown as an example, please change values accordingly.
+   Modify the ``clusterpolicy.json`` file to specify ``driver.licensingConfig``, ``driver.repository``, ``driver.image``, ``driver.version``, and ``driver.imagePullSecrets`` created during the prerequisite steps. The following snippet is shown as an example. Change values accordingly.
 
    .. code-block:: json
 
@@ -372,7 +373,7 @@ The GPU Operator generates GPU performance metrics (DCGM-export), status metrics
 When the GPU Operator is installed in the suggested ``nvidia-gpu-operator`` namespace, the GPU Operator automatically enables monitoring if the ``openshift.io/cluster-monitoring`` label is not defined.
 If the label is defined, the GPU Operator will not change its value.
 
-Disable cluster monitoring in the ``nvidia-gpu-operator`` namespace by setting ``openshift.io/cluster-monitoring=false`` as shown:
+Disable cluster monitoring in the ``nvidia-gpu-operator`` namespace by setting ``openshift.io/cluster-monitoring=false``:
 
 .. code-block:: console
 
@@ -459,7 +460,7 @@ Run a simple CUDA VectorAdd sample that adds two vectors together to ensure the
 Getting information about the GPU
 *************************************************************
 
-The ``nvidia-smi`` shows memory usage, GPU utilization, and the temperature of the GPU. Test the GPU access by running the popular ``nvidia-smi`` command within the pod.
+The ``nvidia-smi`` command shows memory usage, GPU utilization, and the temperature of the GPU. Test the GPU access by running the popular ``nvidia-smi`` command within the pod.
 
 To view GPU utilization, run ``nvidia-smi`` from a pod in the GPU Operator daemonset.
 
@@ -481,7 +482,7 @@ To view GPU utilization, run ``nvidia-smi`` from a pod in the GPU Operator daemo
       nvidia-driver-daemonset-410.84.202203290245-0-xxgdv   2/2     Running   0          23m   10.130.2.18   ip-10-0-143-147.ec2.internal   <none>           <none>
 
 
-   .. note:: With the Pod and node name, run the ``nvidia-smi`` on the correct node.
+   .. note:: With the pod and node name, run the ``nvidia-smi`` command on the correct node.
 
 #. Run the ``nvidia-smi`` command within the pod:
 
@@ -513,6 +514,6 @@ To view GPU utilization, run ``nvidia-smi`` from a pod in the GPU Operator daemo
       |  No running processes found                                                 |
       +-----------------------------------------------------------------------------+
 
-   Two tables are generated. The first table reflects the information about all available GPUs (the example shows one GPU). The second table provides details on the processes using the GPUs.
+   Two tables are generated. The first table reflects the information about all available GPUs (the example shows one GPU). The second table provides details about the processes using the GPUs.
 
-   For more information describing the contents of the tables see the man page for ``nvidia-smi``.
+   For more information describing the contents of the tables, refer to the man page for ``nvidia-smi``.
diff --git a/openshift/install-nfd.rst b/openshift/install-nfd.rst
@@ -26,7 +26,7 @@ The Node Feature Discovery (NFD) Operator is a prerequisite for the **NVIDIA GPU
       NAME                                      READY   STATUS    RESTARTS   AGE
       nfd-controller-manager-7f86ccfb58-nqgxm   2/2     Running   0          11m
 
-#. When the Node Feature Discovery is installed, create an instance of Node Feature Discovery using the **NodeFeatureDiscovery** tab.
+#. When the Node Feature Discovery is installed, create an instance of Node Feature Discovery using the **NodeFeatureDiscovery** tab:
 
    #. Click **Operators** > **Installed Operators** from the side menu.
 
@@ -38,7 +38,7 @@ The Node Feature Discovery (NFD) Operator is a prerequisite for the **NVIDIA GPU
 
    #. In the following screen, click **Create**. This starts the Node Feature Discovery Operator that proceeds to label the nodes in the cluster that have GPUs.
 
-      .. note:: The values prepopulated by the OperatorHub are valid for the GPU Operator.
+   .. note:: The values prepopulated by the OperatorHub are valid for the GPU Operator.
 
 *************************************************************************
 Verify that the Node Feature Discovery Operator is functioning correctly
@@ -61,7 +61,7 @@ The Node Feature Discovery Operator uses vendor PCI IDs to identify hardware in
 
    .. note:: ``0x10de`` is the PCI vendor ID assigned to NVIDIA.
 
-#. Verify that the GPU device (``pci-10de``) is discovered on the GPU node.
+#. Verify that the GPU device (``pci-10de``) is discovered on the GPU node:
 
    .. code-block:: console
 
diff --git a/openshift/introduction.rst b/openshift/introduction.rst
@@ -17,7 +17,7 @@ Red Hat OpenShift Container Platform includes enhancements to Kubernetes so user
 
 The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPU. These components include the NVIDIA drivers (to enable CUDA),
 Kubernetes device plugin for GPUs, the `NVIDIA Container Toolkit <https://github.com/NVIDIA/nvidia-container-toolkit>`_,
-automatic node labeling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_-based monitoring and others.
+automatic node labeling using `GFD <https://github.com/NVIDIA/gpu-feature-discovery>`_, `DCGM <https://developer.nvidia.com/dcgm>`_-based monitoring, and others.
 
 For guidance on the specific NVIDIA support entitlement needs,
 refer |essug|_ if you have an NVIDIA AI Enterprise entitlement.
diff --git a/openshift/openshift-virtualization.rst b/openshift/openshift-virtualization.rst