Skip to content

Commit 66d2bab

Browse files
committed
Add gke known issue
Signed-off-by: Abigail McCarthy <[email protected]>
1 parent 49b10b4 commit 66d2bab

File tree

2 files changed

+25
-0
lines changed

2 files changed

+25
-0
lines changed

gpu-operator/google-gke.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,20 @@ Prerequisites
8080
Refer to `GPU platforms <https://cloud.google.com/compute/docs/gpus>`_
8181
in the Google Cloud documentation.
8282

83+
.. note::
84+
85+
When installing NVIDIA GPU Operator v25.10.0 on GKE, there is a known issue in the NVIDIA Container Toolkit v1.18.0, the default toolkit version, that will misconfigure the config.toml file and prevent GPU Operator containers from starting up correctly.
86+
87+
To resolve this issue, set the ``RUNTIME_CONFIG_SOURCE=file`` environment variable in the toolkit container to resolve this issue.
88+
You can set this environment variable by setting the below in the ClusterPolicy CR:
89+
90+
.. code-block:: yaml
91+
92+
toolkit:
93+
env:
94+
- name: RUNTIME_CONFIG_SOURCE
95+
value: "file"
96+
8397
8498
*********************************
8599
Using the Google Driver Installer

gpu-operator/release-notes.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,17 @@ Known Issues
188188
189189
Create the ConfigMap, then update the ClusterPolicy with the name of the configMap in the ``vgpuDeviceManager.config.name``, and restart the vgpu-device-manager pod.
190190

191+
- When using GKE, there is a known issue in the NVIDIA Container Toolkit v1.18.0 that will miss configure the config.toml file and prevent GPU Operator containers from starting up correctly.
192+
To resolve this issue, set the ``RUNTIME_CONFIG_SOURCE=file`` environment variable in the toolkit container to resolve this issue.
193+
You can set this environment variable by setting the below in the ClusterPolicy CR:
194+
195+
.. code-block:: yaml
196+
197+
toolkit:
198+
env:
199+
- name: RUNTIME_CONFIG_SOURCE
200+
value: "file"
201+
191202
.. _v25.3.4:
192203

193204
25.3.4

0 commit comments

Comments
 (0)