Fix broken links (#271)

chenopis · web-flow · commit 354c6a4ab215 · 2025-10-08T09:03:16.000-07:00
diff --git a/gpu-operator/amazon-eks.rst b/gpu-operator/amazon-eks.rst
@@ -102,11 +102,10 @@ without any limitations, you perform the following high-level actions:
   the instance type to meet your needs:
 
   * Table of accelerated computing
-    `instance types <https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing>`_
+    `instance types <https://aws.amazon.com/ec2/instance-types/accelerated-computing/>`_
     for information about GPU model and count, RAM, and storage.
 
-  * Table of
-    `maximum network interfaces <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#enis-acceleratedcomputing>`_
+  * `Maximum IP addresses per network interface <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AvailableIpPerENI.html>`_
     for accelerated computing instance types.
     Make sure the instance type supports enough IP addresses for your workload.
     For example, the ``g4dn.xlarge`` instance type supports ``29`` IP addresses for pods on the node.
@@ -132,7 +131,7 @@ Prerequisites
   and `Configuring the AWS CLI <https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html>`_
   in the AWS CLI documentation.
 * You installed the ``eksctl`` CLI if you prefer it as your client application.
-  The CLI is available from https://eksctl.io/introduction/#installation.
+  The CLI is available from https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html#eksctl-install-update.
 * You have the AMI value from https://cloud-images.ubuntu.com/aws-eks/.
 * You have the EC2 instance type to use for your nodes.
 
diff --git a/gpu-operator/dra-cds.rst b/gpu-operator/dra-cds.rst
@@ -49,7 +49,8 @@ For more detail on the security properties of a ComputeDomain, see `Security <dr
 A deeper dive: related resources
 ================================
 
-For more background on how ComputeDomains facilitate orchestrating MNNVL workloads on Kubernetes, see `this doc <https://docs.google.com/document/d/1PrdDofsPFVJuZvcv-vtlI9n2eAh-YVf_fRQLIVmDwVY/edit?tab=t.0#heading=h.qkogm924v5so>`_ and `this slide deck <https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g28ac369118f_0_1647#slide=id.g28ac369118f_0_1647>`_.
+For more background on how ComputeDomains facilitate orchestrating MNNVL workloads on Kubernetes, refer to the `Kubernetes support for GH200 / GB200 <https://docs.google.com/document/d/1PrdDofsPFVJuZvcv-vtlI9n2eAh-YVf_fRQLIVmDwVY/edit?tab=t.0#heading=h.nfp9friarxam>`_ document
+and the `Supporting GB200 on Kubernetes <https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g373e0ebfa8e_1_142#slide=id.g373e0ebfa8e_1_142>`_ slide deck.
 For an outlook on planned improvements on the ComputeDomain concept, please refer to `this document <https://github.com/NVIDIA/k8s-dra-driver-gpu/releases/tag/v25.3.0-rc.3>`_.
 
 Details about IMEX and its relationship to NVLink may be found in `NVIDIA's IMEX guide <https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html>`_, and in `NVIDIA's NVLink guide <https://docs.nvidia.com/multi-node-nvlink-systems/mnnvl-user-guide/overview.html#internode-memory-exchange-service>`_.
diff --git a/gpu-operator/dra-gpus.rst b/gpu-operator/dra-gpus.rst
@@ -12,7 +12,7 @@ NVIDIA DRA Driver for GPUs
 GPU allocation
 **************
 
-Compared to `traditional GPU allocation <https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins/>`_ using coarse-grained count-based requests, the GPU allocation side of this driver enables fine-grained control and powerful features long desired by the community, such as:
+Compared to `traditional GPU allocation <https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins>`_ using coarse-grained count-based requests, the GPU allocation side of this driver enables fine-grained control and powerful features long desired by the community, such as:
 
 #. Controlled sharing of individual GPUs between multiple pods and/or containers.
 #. GPU selection via complex constraints expressed via `CEL <https://kubernetes.io/docs/reference/using-api/cel/>`_.
diff --git a/gpu-operator/dra-intro-install.rst b/gpu-operator/dra-intro-install.rst
@@ -48,7 +48,7 @@ Prerequisites
 =============
 
 - Kubernetes v1.32 or newer.
-- DRA and corresponding API groups must be enabled (`see Kubernetes docs <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#enabling-dynamic-resource-allocation>`_).
+- DRA and corresponding API groups must be enabled (`see Kubernetes docs <https://kubernetes.io/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/#enable-dra>`_).
 - `CDI <https://github.com/cncf-tags/container-device-interface?tab=readme-ov-file#how-to-configure-cdi>`_ must be enabled in the underlying container runtime (such as containerd or CRI-O).
 - NVIDIA GPU Driver 565 or later.
 
diff --git a/gpu-operator/getting-started.rst b/gpu-operator/getting-started.rst
@@ -168,7 +168,7 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
      - ``true``
 
    * - ``dcgmExporter.service.internalTrafficPolicy``
-     - Specifies the `internalTrafficPolicy <https://kubernetes.io/docs/concepts/services-networking/service/#internal-traffic-policy>`_ for the DCGM Exporter service.
+     - Specifies the `internalTrafficPolicy <https://kubernetes.io/docs/concepts/services-networking/service/#traffic-policies>`_ for the DCGM Exporter service.
        Available values are ``Cluster`` (default) or ``Local``.
      - ``Cluster``
 
diff --git a/gpu-operator/gpu-operator-kubevirt.rst b/gpu-operator/gpu-operator-kubevirt.rst
@@ -70,7 +70,7 @@ Assumptions, constraints, and dependencies
 
 * The GPU Operator will not automate the installation of NVIDIA drivers inside KubeVirt virtual machines with GPUs/vGPUs attached.
 
-* Users must manually add all passthrough GPU and vGPU resources to the ``permittedDevices`` list in the KubeVirt CR before assigning them to KubeVirt virtual machines. Refer to the `KubeVirt documentation <https://kubevirt.io/user-guide/virtual_machines/host-devices/#listing-permitted-devices>`_ for more information.
+* Users must manually add all passthrough GPU and vGPU resources to the ``permittedDevices`` list in the KubeVirt CR before assigning them to KubeVirt virtual machines. Refer to the `KubeVirt documentation <https://kubevirt.io/user-guide/compute/host-devices/#listing-permitted-devices>`_ for more information.
 
 * MIG-backed vGPUs are not supported.
 
@@ -512,7 +512,7 @@ Building the NVIDIA vGPU Manager image
 
 This section covers building the NVIDIA vGPU Manager container image and pushing it to a private registry.
 
-Download the vGPU Software from the `NVIDIA Licensing Portal <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
+Download the vGPU Software from the `NVIDIA Licensing Portal <https://stg.ui.licensing.nvidia.com/>`_.
 
 * Login to the NVIDIA Licensing Portal and navigate to the **Software Downloads** section.
 * The NVIDIA vGPU Software is located in the **Software Downloads** section of the NVIDIA Licensing Portal.
diff --git a/gpu-operator/gpu-operator-rdma.rst b/gpu-operator/gpu-operator-rdma.rst
@@ -99,7 +99,7 @@ The prerequisites for configuring GPUDirect RDMA or GPUDirect Storage depend on
     * ``pciPassthru.64bitMMIOSizeGB = 128``
 
     For information about configuring the settings, refer to the
-    `Deploy an AI-Ready Enterprise Platform on vSphere 7 <https://core.vmware.com/resource/deploy-ai-ready-vsphere-7#vm-settings-A>`_
+    `Deploy an AI-Ready Enterprise Platform on vSphere 7 <https://www.vmware.com/docs/deploy-an-ai-ready-enterprise-platform-on-vsphere-7-update-2#vm-settings-A>`_
     document from VMWare.
 
 **************************
diff --git a/gpu-operator/install-gpu-operator-nvaie.rst b/gpu-operator/install-gpu-operator-nvaie.rst
@@ -82,7 +82,7 @@ Prerequisites
   in the *NVIDIA License System User Guide* for more information.
 - An NGC CLI API key that is used to create an image pull secret.
   The secret is used to pull the prebuilt vGPU driver image from NVIDIA NGC.
-  Refer to `Generating Your NGC API Key <https://docs.nvidia.com/ngc/gpu-cloud/ngc-private-registry-user-guide/index.html#generating-api-key>`__
+  Refer to `Generating Your NGC API Key <https://docs.nvidia.com/ngc/latest/ngc-private-registry-user-guide.html#prug-generating-personal-api-key>`__
   in the *NVIDIA NGC Private Registry User Guide* for more information.
 
 Procedure
@@ -179,7 +179,7 @@ The following list summarizes the driver branches for each release.
 
 For newer releases, you can confirm the the supported driver branch by performing the following steps:
 
-#. Refer to the `release documentation <https://docs.nvidia.com/ai-enterprise/#release-documentation>`__
+#. Refer to the `NVIDIA AI Enterprise Infra Release Branches <https://docs.nvidia.com/ai-enterprise/#infrastructure-software>`__
    for NVIDIA AI Enterprise and access the documentation for your release.
 
 #. In the release notes, identify the supported NVIDIA Data Center GPU Driver branch.
diff --git a/gpu-operator/life-cycle-policy.rst b/gpu-operator/life-cycle-policy.rst
@@ -167,7 +167,7 @@ Refer to :ref:`Upgrading the NVIDIA GPU Operator` for more information.
 .. note::
 
    - Driver version could be different with NVIDIA vGPU, as it depends on the driver
-     version downloaded from the `NVIDIA vGPU Software Portal  <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
+     version downloaded from the `NVIDIA Licensing Portal  <https://ui.licensing.nvidia.com>`_.
    - The GPU Operator is supported on all active NVIDIA data center production drivers.
-     Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#cuda-drivers>`_
+     Refer to `Supported Drivers and CUDA Toolkit Versions <https://docs.nvidia.com/datacenter/tesla/drivers/index.html#supported-drivers-and-cuda-toolkit-versions>`_
      for more information.
diff --git a/gpu-operator/microsoft-aks.rst b/gpu-operator/microsoft-aks.rst
@@ -48,8 +48,8 @@ When you follow this approach, you can install the Operator without any special
 considerations or arguments.
 Refer to :ref:`Install NVIDIA GPU Operator`.
 
-For more information about this preview feature, see
-`Skip GPU driver installation (preview) <https://learn.microsoft.com/en-us/azure/aks/gpu-cluster?source=recommendations&tabs=add-ubuntu-gpu-node-pool#skip-gpu-driver-installation-preview>`__
+For more information about this feature, see
+`Skip GPU driver installation <https://learn.microsoft.com/en-us/azure/aks/use-nvidia-gpu?source=recommendations&tabs=add-ubuntu-gpu-node-pool#skip-gpu-driver-installation>`__
 in the Azure Kubernetes Service documentation.
 
 
diff --git a/gpu-operator/platform-support.rst b/gpu-operator/platform-support.rst
@@ -462,6 +462,7 @@ The GPU Operator has been validated in the following scenarios:
          - 1.29---1.33
          -
 
+.. _supported-precompiled-drivers:
 
 Supported Precompiled Drivers
 -----------------------------
diff --git a/gpu-operator/precompiled-drivers.rst b/gpu-operator/precompiled-drivers.rst
@@ -41,7 +41,7 @@ Limitations and Restrictions
 ============================
 
 * Support for deploying the driver containers with precompiled drivers is limited to
-  hosts with the x86_64 architecture and operating system versions listed in the `Supported Precompiled Drivers <platform-support#supported-precompiled-drivers>`_ table.
+  hosts with the x86_64 architecture and operating system versions listed in the :ref:`supported-precompiled-drivers` table.
 
   For information about using precompiled drivers with OpenShift Container Platform,
   refer to :external+ocp:doc:`gpu-operator-with-precompiled-drivers`.
diff --git a/mig/mig-examples.rst b/mig/mig-examples.rst
@@ -41,7 +41,7 @@ Concurrent Job Launch
 Now, let's try a more complex example. In this example, we will use Argo Workflows to launch concurrent 
 jobs on MIG devices. In this example, the A100 has been configured into 2 MIG devices using the: ``3g.20gb`` profile.
 
-First, `install <https://argoproj.github.io/argo-workflows/quick-start/#install-argo-workflows>`_ the Argo Workflows 
+First, `install <https://argo-workflows.readthedocs.io/en/latest/quick-start/#install-argo-workflows>`_ the Argo Workflows 
 components into your Kubernetes cluster. 
 
 .. code-block:: console