Skip to content

Commit ca5798c

Browse files
Merge branch 'NVIDIA:main' into docs/265
2 parents 82c73f9 + b69fe43 commit ca5798c

21 files changed

+65
-88
lines changed

.gitlab-ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ publish_docs:
138138
script:
139139
- echo "Pushing docs live to https://docs.nvidia.com/datacenter/cloud-native"
140140
- |+
141-
if [[ "${CI_COMMIT_REF_NAME}" =~ (.+)-v([0-9]+\.[0-9]+\.[0-9]+) ]]; then
141+
if [[ "${CI_COMMIT_REF_NAME}" =~ (.+)-v([0-9]+\.[0-9]+(\.[a-zA-Z0-9]+)?) ]]; then
142142
export DOCSET="${BASH_REMATCH[1]}"
143143
export VERSION="${BASH_REMATCH[2]}"
144144
fi
@@ -148,7 +148,7 @@ publish_docs:
148148
exit 1
149149
fi
150150
- |+
151-
if [[ "${CI_COMMIT_MESSAGE}" =~ $'\n/not-latest\n' ]]; then
151+
if [[ "${CI_COMMIT_MESSAGE}" =~ $'/not-latest\n' ]]; then
152152
export FORCE_LATEST=false
153153
fi
154154
- echo "Publishing docs for ${DOCSET} and version ${VERSION}"

container-toolkit/install-guide.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ where `systemd` cgroup drivers are used that cause containers to lose access to
4444
Optionally, configure the repository to use experimental packages:
4545

4646
```console
47-
$ sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
47+
$ sudo sed -i -e '/experimental/ s/^#//g' /etc/apt/sources.list.d/nvidia-container-toolkit.list
4848
```
4949

5050
1. Update the packages list from the repository:

gpu-operator/amazon-eks.rst

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -102,11 +102,10 @@ without any limitations, you perform the following high-level actions:
102102
the instance type to meet your needs:
103103

104104
* Table of accelerated computing
105-
`instance types <https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing>`_
105+
`instance types <https://aws.amazon.com/ec2/instance-types/accelerated-computing/>`_
106106
for information about GPU model and count, RAM, and storage.
107107

108-
* Table of
109-
`maximum network interfaces <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html#enis-acceleratedcomputing>`_
108+
* `Maximum IP addresses per network interface <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AvailableIpPerENI.html>`_
110109
for accelerated computing instance types.
111110
Make sure the instance type supports enough IP addresses for your workload.
112111
For example, the ``g4dn.xlarge`` instance type supports ``29`` IP addresses for pods on the node.
@@ -132,7 +131,7 @@ Prerequisites
132131
and `Configuring the AWS CLI <https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html>`_
133132
in the AWS CLI documentation.
134133
* You installed the ``eksctl`` CLI if you prefer it as your client application.
135-
The CLI is available from https://eksctl.io/introduction/#installation.
134+
The CLI is available from https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html#eksctl-install-update.
136135
* You have the AMI value from https://cloud-images.ubuntu.com/aws-eks/.
137136
* You have the EC2 instance type to use for your nodes.
138137

gpu-operator/custom-driver-params.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@ To pass custom parameters, execute the following steps.
4949
Example using ``nvidia-uvm`` module
5050
-----------------------------------
5151

52-
This example shows the High Memory Mode being disabled in the ``nvidia-uvm`` module.
52+
This example shows the Heterogeneous Memory Management (HMM) being disabled in the ``nvidia-uvm`` module.
53+
Refer to `Simplifying GPU Application Development with Heterogeneous Memory Management <https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/>`_ for more information about HMM.
5354

5455
#. Create a configuration file named ``nvidia-uvm.conf``:
5556

gpu-operator/dra-cds.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,8 @@ For more detail on the security properties of a ComputeDomain, see `Security <dr
4949
A deeper dive: related resources
5050
================================
5151

52-
For more background on how ComputeDomains facilitate orchestrating MNNVL workloads on Kubernetes, see `this doc <https://docs.google.com/document/d/1PrdDofsPFVJuZvcv-vtlI9n2eAh-YVf_fRQLIVmDwVY/edit?tab=t.0#heading=h.qkogm924v5so>`_ and `this slide deck <https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g28ac369118f_0_1647#slide=id.g28ac369118f_0_1647>`_.
52+
For more background on how ComputeDomains facilitate orchestrating MNNVL workloads on Kubernetes, refer to the `Kubernetes support for GH200 / GB200 <https://docs.google.com/document/d/1PrdDofsPFVJuZvcv-vtlI9n2eAh-YVf_fRQLIVmDwVY/edit?tab=t.0#heading=h.nfp9friarxam>`_ document
53+
and the `Supporting GB200 on Kubernetes <https://docs.google.com/presentation/d/1Xupr8IZVAjs5bNFKJnYaK0LE7QWETnJjkz6KOfLu87E/edit?pli=1&slide=id.g373e0ebfa8e_1_142#slide=id.g373e0ebfa8e_1_142>`_ slide deck.
5354
For an outlook on planned improvements on the ComputeDomain concept, please refer to `this document <https://github.com/NVIDIA/k8s-dra-driver-gpu/releases/tag/v25.3.0-rc.3>`_.
5455

5556
Details about IMEX and its relationship to NVLink may be found in `NVIDIA's IMEX guide <https://docs.nvidia.com/multi-node-nvlink-systems/imex-guide/overview.html>`_, and in `NVIDIA's NVLink guide <https://docs.nvidia.com/multi-node-nvlink-systems/mnnvl-user-guide/overview.html#internode-memory-exchange-service>`_.

gpu-operator/dra-gpus.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ NVIDIA DRA Driver for GPUs
1212
GPU allocation
1313
**************
1414

15-
Compared to `traditional GPU allocation <https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins/>`_ using coarse-grained count-based requests, the GPU allocation side of this driver enables fine-grained control and powerful features long desired by the community, such as:
15+
Compared to `traditional GPU allocation <https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#using-device-plugins>`_ using coarse-grained count-based requests, the GPU allocation side of this driver enables fine-grained control and powerful features long desired by the community, such as:
1616

1717
#. Controlled sharing of individual GPUs between multiple pods and/or containers.
1818
#. GPU selection via complex constraints expressed via `CEL <https://kubernetes.io/docs/reference/using-api/cel/>`_.

gpu-operator/dra-intro-install.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ Prerequisites
4848
=============
4949

5050
- Kubernetes v1.32 or newer.
51-
- DRA and corresponding API groups must be enabled (`see Kubernetes docs <https://kubernetes.io/docs/concepts/scheduling-eviction/dynamic-resource-allocation/#enabling-dynamic-resource-allocation>`_).
51+
- DRA and corresponding API groups must be enabled (`see Kubernetes docs <https://kubernetes.io/docs/tasks/configure-pod-container/assign-resources/set-up-dra-cluster/#enable-dra>`_).
5252
- `CDI <https://github.com/cncf-tags/container-device-interface?tab=readme-ov-file#how-to-configure-cdi>`_ must be enabled in the underlying container runtime (such as containerd or CRI-O).
5353
- NVIDIA GPU Driver 565 or later.
5454

gpu-operator/getting-started.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -168,7 +168,7 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
168168
- ``true``
169169

170170
* - ``dcgmExporter.service.internalTrafficPolicy``
171-
- Specifies the `internalTrafficPolicy <https://kubernetes.io/docs/concepts/services-networking/service/#internal-traffic-policy>`_ for the DCGM Exporter service.
171+
- Specifies the `internalTrafficPolicy <https://kubernetes.io/docs/concepts/services-networking/service/#traffic-policies>`_ for the DCGM Exporter service.
172172
Available values are ``Cluster`` (default) or ``Local``.
173173
- ``Cluster``
174174

gpu-operator/gpu-operator-kubevirt.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ Assumptions, constraints, and dependencies
7070

7171
* The GPU Operator will not automate the installation of NVIDIA drivers inside KubeVirt virtual machines with GPUs/vGPUs attached.
7272

73-
* Users must manually add all passthrough GPU and vGPU resources to the ``permittedDevices`` list in the KubeVirt CR before assigning them to KubeVirt virtual machines. Refer to the `KubeVirt documentation <https://kubevirt.io/user-guide/virtual_machines/host-devices/#listing-permitted-devices>`_ for more information.
73+
* Users must manually add all passthrough GPU and vGPU resources to the ``permittedDevices`` list in the KubeVirt CR before assigning them to KubeVirt virtual machines. Refer to the `KubeVirt documentation <https://kubevirt.io/user-guide/compute/host-devices/#listing-permitted-devices>`_ for more information.
7474

7575
* MIG-backed vGPUs are not supported.
7676

@@ -512,7 +512,7 @@ Building the NVIDIA vGPU Manager image
512512

513513
This section covers building the NVIDIA vGPU Manager container image and pushing it to a private registry.
514514

515-
Download the vGPU Software from the `NVIDIA Licensing Portal <https://nvid.nvidia.com/dashboard/#/dashboard>`_.
515+
Download the vGPU Software from the `NVIDIA Licensing Portal <https://stg.ui.licensing.nvidia.com/>`_.
516516

517517
* Login to the NVIDIA Licensing Portal and navigate to the **Software Downloads** section.
518518
* The NVIDIA vGPU Software is located in the **Software Downloads** section of the NVIDIA Licensing Portal.

gpu-operator/gpu-operator-rdma.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ The prerequisites for configuring GPUDirect RDMA or GPUDirect Storage depend on
9999
* ``pciPassthru.64bitMMIOSizeGB = 128``
100100

101101
For information about configuring the settings, refer to the
102-
`Deploy an AI-Ready Enterprise Platform on vSphere 7 <https://core.vmware.com/resource/deploy-ai-ready-vsphere-7#vm-settings-A>`_
102+
`Deploy an AI-Ready Enterprise Platform on vSphere 7 <https://www.vmware.com/docs/deploy-an-ai-ready-enterprise-platform-on-vsphere-7-update-2#vm-settings-A>`_
103103
document from VMWare.
104104

105105
**************************

0 commit comments

Comments
 (0)