Skip to content

Commit 1c03be5

Browse files
authored
Docs update (#300)
* update css Signed-off-by: Abigail McCarthy <[email protected]> * install command and add configuration flags Signed-off-by: Abigail McCarthy <[email protected]> * update versions Signed-off-by: Abigail McCarthy <[email protected]> * remove confidential containers links Signed-off-by: Abigail McCarthy <[email protected]> * fix repository name Signed-off-by: Abigail McCarthy <[email protected]> * minor updates Signed-off-by: Abigail McCarthy <[email protected]> * fix commands Signed-off-by: Abigail McCarthy <[email protected]> --------- Signed-off-by: Abigail McCarthy <[email protected]>
1 parent 3e11d27 commit 1c03be5

File tree

7 files changed

+374
-3
lines changed

7 files changed

+374
-3
lines changed

css/custom.css

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,11 @@ html[data-theme=light] .highlight .go {
2323
flex-basis: 15%;
2424
min-width: var(--pst-sidebar-secondary);
2525
}
26-
26+
27+
html[data-theme=light] .bd-toc-nav .nav-link-expand {
28+
display: none !important;
29+
}
30+
31+
.bd-sidebar-primary li.has-children>details>summary .toctree-toggle {
32+
display: none !important;
33+
}

gpu-operator/getting-started.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,6 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
138138

139139
* - ``ccManager.enabled``
140140
- When set to ``true``, the Operator deploys NVIDIA Confidential Computing Manager for Kubernetes.
141-
Refer to :doc:`gpu-operator-confidential-containers` for more information.
142141
- ``false``
143142

144143
* - ``cdi.enabled``
@@ -187,6 +186,14 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
187186
Set this value to ``false`` when using the Operator on systems with pre-installed drivers.
188187
- ``true``
189188

189+
* - ``driver.image``
190+
- Name of the NVIDIA Driver Container image to use.
191+
- ``driver``
192+
193+
* - ``driver.imagePullSecrets``
194+
- List of the image pull secret used for pulling the driver container image from the registry.
195+
- None
196+
190197
* - ``driver.kernelModuleType``
191198
- Specifies the type of the NVIDIA GPU Kernel modules to use.
192199
Valid values are ``auto`` (default), ``proprietary``, and ``open``.
@@ -215,6 +222,11 @@ To view all the options, run ``helm show values nvidia/gpu-operator``.
215222
- Indicate if MLNX_OFED (MOFED) drivers are pre-installed on the host.
216223
- ``false``
217224

225+
* - ``driver.secretEnv``
226+
- The name of the secret to the driver container.
227+
A common use case is to use this field to pass your Ubuntu Pro token secret if you are deploying the GPU Operator with government-ready components. Refer to :doc:`install-gpu-operator-gov-ready` for more information.
228+
- None
229+
218230
* - ``driver.startupProbe``
219231
- By default, the driver container has an initial delay of ``60s`` before starting liveness probes.
220232
The probe runs the ``nvidia-smi`` command with a timeout duration of ``60s``.
Lines changed: 227 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,227 @@
1+
.. license-header
2+
SPDX-FileCopyrightText: Copyright (c) 2023 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
17+
.. headings # #, * *, =, -, ^, "
18+
19+
20+
.. _install-gpu-operator-gov-ready:
21+
22+
####################################
23+
NVIDIA GPU Operator Government Ready
24+
####################################
25+
26+
The NVIDIA GPU Operator now offers government-ready components for NVIDIA AI Enterprise customers.
27+
Government ready is NVIDIA's designation for software that meets applicable security requirements for deployment in your FedRAMP High or equivalent sovereign use case.
28+
For more information on NVIDIA's government-ready support, refer to the white paper `AI Software for Regulated Environments <https://docs.nvidia.com/ai-enterprise/planning-resource/ai-software-regulated-environments-white-paper/latest/index.html>`_.
29+
30+
31+
Supported GPU Operator Components
32+
==================================
33+
The government-ready NVIDIA GPU Operator includes the following components:
34+
35+
.. _fn1: #base-image
36+
.. |fn1| replace:: :sup:`1`
37+
38+
.. list-table::
39+
:header-rows: 1
40+
41+
* - Component
42+
- Version
43+
* - NVIDIA GPU Operator
44+
- v25.10.0
45+
* - NVIDIA GPU Feature Discovery
46+
- 0.18.0
47+
* - NVIDIA Container Toolkit
48+
- 1.18.0
49+
* - NVIDIA Device Plugin
50+
- 0.18.0
51+
* - NVIDIA DCGM-exporter
52+
- 4.4.1-4.6.0
53+
* - NVIDIA MIG Manager
54+
- 0.13.0
55+
* - NVIDIA Driver
56+
- 580.95.05 |fn1|_
57+
58+
:sup:`1`
59+
Hardened for STIG/FIPS compliance
60+
61+
Artifacts for these components are available from the `NVIDIA NGC Catalog <https://registry.ngc.nvidia.com/orgs/nvstaging/teams/cloud-native/containers/gpu-driver-stig-fips>`_.
62+
63+
.. note::
64+
65+
Not all GPU Operator components and features are available as government-ready containers in the v25.10.0 release.
66+
For example, GPUDirect Storage and KubeVirt are not yet supported.
67+
68+
69+
Validated Kubernetes Distributions
70+
===================================
71+
72+
The government-ready NVIDIA GPU Operator has been validated on the following Kubernetes distributions:
73+
74+
- Canonical Kubernetes 1.34 with Ubuntu Pro 24.04 and FIPS-compliant kernel
75+
- Red Hat OpenShift 4.19 in FIPS mode
76+
77+
Install Government-Ready NVIDIA GPU Operator
78+
=============================================
79+
80+
Once you have your :ref:`gov-ready-prerequisites` configured, use the following steps to install the NVIDIA GPU Operator on Canonical Kubernetes distributions:
81+
82+
#. :ref:`install-nfd`
83+
#. :ref:`create-ngc-api-pull-secret`
84+
#. :ref:`create-ubuntu-pro-token-secret`
85+
#. :ref:`deploy-nvidia-gpu-operator-gov-ready`
86+
87+
.. note::
88+
89+
For deployment on OpenShift, refer to the :external+ocp:doc:`install-gpu-operator-gov-ready-openshift` page.
90+
91+
.. _gov-ready-prerequisites:
92+
93+
Prerequisites
94+
-------------
95+
96+
- An active NVIDIA AI Enterprise subscription and NGC API token to access GPU Operator government-ready containers.
97+
Refer to `Generating Your NGC API Key <https://docs.nvidia.com/ngc/gpu-cloud/ngc-user-guide/index.html#generating-api-key>`_ in the NVIDIA NGC User Guide for more information on NGC API tokens.
98+
99+
- An Ubuntu Pro token for Canonical Kubernetes deployments.
100+
This token is required for the driver container to download kernel headers and other necessary packages from the Canonical repository when using the FIPS-enabled kernel on Ubuntu 24.04.
101+
Refer to the `Ubuntu Pro documentation <https://documentation.ubuntu.com/pro-client/en/v30/howtoguides/get_token_and_attach/>`_ for more information on accessing Ubuntu Pro tokens.
102+
103+
- The ``helm`` CLI installed on a client machine.
104+
105+
You can run the following commands to install the Helm CLI:
106+
107+
.. code-block:: console
108+
109+
$ curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 \
110+
&& chmod 700 get_helm.sh \
111+
&& ./get_helm.sh
112+
113+
- A namespace to deploy the NVIDIA GPU Operator.
114+
The example install commands below use ``gpu-operator`` as the namespace.
115+
116+
- Optionally, Service Mesh for intra-cluster traffic encryption.
117+
By default, the NVIDIA GPU Operator does not encrypt traffic between its controller (and operands) and the Kubernetes API server.
118+
If you wish to encrypt this communication, you should deploy and maintain a service mesh application within the Kubernetes cluster to enable secure traffic.
119+
120+
.. _install-nfd:
121+
122+
Install Node Feature Discovery (NFD)
123+
-------------------------------------
124+
125+
NFD is an open-source project that is a dependency for the Operator on each node in your cluster.
126+
It must be deployed before installing the NVIDIA GPU Operator.
127+
128+
GPU Operator does not maintain a government ready version of NFD, it is recommended that you install the upstream NFD version that aligns with the :ref:`operator-component-matrix`.
129+
The NFD container is built on top of a scratch image, providing a highly secure container environment.
130+
For information on NFD CVEs and security updates, refer to the `NFD GitHub repository <https://github.com/kubernetes-sigs/node-feature-discovery/security>`_.
131+
132+
Refer to the NFD documentation for `installation instructions <https://kubernetes-sigs.github.io/node-feature-discovery/stable/get-started/index.html>`_.
133+
134+
135+
.. _create-ngc-api-pull-secret:
136+
137+
Create NGC API Pull Secret
138+
---------------------------
139+
140+
Add a Docker registry secret for downloading the GPU Operator artifacts from NVIDIA NGC in the same namespace where you are planning to deploy the NVIDIA GPU Operator.
141+
Update ``ngc-api-key`` in the command below with your NGC API key.
142+
143+
.. code-block:: console
144+
145+
$ kubectl create secret -n gpu-operator docker-registry ngc-secret \
146+
--docker-server=nvcr.io \
147+
--docker-username='$oauthtoken' \
148+
--docker-password=<ngc-api-key>
149+
150+
.. _create-ubuntu-pro-token-secret:
151+
152+
Create Ubuntu Pro Token Secret
153+
-------------------------------
154+
155+
Create a Kubernetes secret to hold the value of your Ubuntu Pro token secret.
156+
This secret will be used in the install command in the next step.
157+
158+
The Ubuntu Pro Token is required for the driver container to download kernel headers and other necessary packages from the Canonical repository when using the FIPS-enabled kernel on Ubuntu 24.04.
159+
160+
1. Get the Ubuntu Pro token:
161+
162+
.. code-block:: console
163+
164+
$ echo UBUNTU_PRO_TOKEN=<your Ubuntu Pro token> > ubuntu-fips.env
165+
166+
Replace ``<your Ubuntu Pro token>`` with your actual Ubuntu Pro token.
167+
168+
2. Create Ubuntu Pro token Secret:
169+
170+
.. code-block:: console
171+
172+
$ kubectl create secret generic ubuntu-fips-secret \
173+
--from-env-file=./ubuntu-fips.env --namespace gpu-operator
174+
175+
Note that the namespace in the above command is ``gpu-operator``.
176+
Update this to the namespace you are planning to use for the NVIDIA GPU Operator.
177+
178+
.. _deploy-nvidia-gpu-operator-gov-ready:
179+
180+
Install NVIDIA GPU Operator Government-Ready Components
181+
--------------------------------------------------------
182+
183+
#. Label your ``gpu-operator`` namespace for the Operator to set the enforcement policy to privilege.
184+
185+
.. code-block:: console
186+
187+
$ kubectl label --overwrite ns gpu-operator pod-security.kubernetes.io/enforce=privileged
188+
189+
#. Add the NVIDIA Helm repository:
190+
191+
.. code-block:: console
192+
193+
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
194+
&& helm repo update
195+
196+
#. Install the NVIDIA GPU Operator.
197+
198+
.. code-block:: console
199+
200+
$ helm install gpu-operator nvidia/gpu-operator \
201+
--namespace gpu-operator \
202+
--set driver.secretEnv=ubuntu-fips-secret \
203+
--set driver.repository=nvcr.io/nvidia \
204+
--set driver.version=580.95.05-stig-fips \
205+
--set driver.image=gpu-driver-stig-fips \
206+
--set driver.imagePullSecrets={ngc-secret} \
207+
--set nfd.enabled=false
208+
209+
Refer to `Common Chart Customization Options <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#common-chart-customization-options>`_ for more information about installation options.
210+
211+
.. _update-ubuntu-pro-token-in-clusterpolicy:
212+
213+
Update Ubuntu Pro Token in ClusterPolicy
214+
=========================================
215+
216+
You can update your Ubuntu Pro Token after installation by editing your Ubuntu Pro Token secret.
217+
This secret name is set as value of ``driver.secretEnv`` of the GPU Operator ClusterPolicy.
218+
219+
Edit your Ubuntu Pro Token secret.
220+
221+
.. code-block:: console
222+
223+
$ kubectl edit secrets <ubuntu-fips-secret>
224+
225+
Then update the secret with your new Ubuntu Pro Token.
226+
This token is required for the driver container to download kernel headers and other necessary packages from the Canonical repository when using the FIPS-enabled kernel on Ubuntu 24.04.
227+

gpu-operator/install-gpu-operator-nvaie.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,8 @@ Specify the ``--version=<supported-version>`` argument to install a supported ve
202202
Related Information
203203
*******************
204204

205+
.. toctree::
206+
207+
Government Ready <install-gpu-operator-gov-ready.rst>
208+
205209
- `NVIDIA AI Enterprise <https://www.nvidia.com/en-us/data-center/products/ai-enterprise-suite/>`_ web page.

gpu-operator/release-notes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1448,7 +1448,7 @@ New Features
14481448
* Added support for configuring Confidential Containers for GPU workloads as a technology preview feature.
14491449
This feature builds on the work for configuring Kata Containers and
14501450
introduces NVIDIA Confidential Computing Manager for Kubernetes as an operand of GPU Operator.
1451-
Refer to :doc:`gpu-operator-confidential-containers` for more information.
1451+
Refer to gpu-operator-confidential-containers for more information.
14521452

14531453
* Added support for the NVIDIA Data Center GPU Driver version 535.86.10.
14541454
Refer to the :ref:`GPU Operator Component Matrix`

0 commit comments

Comments
 (0)