Skip to content

Commit d425664

Browse files
committed
Add caveat about containerd for nvidia gpu operator
1 parent f73dab8 commit d425664

File tree

1 file changed

+30
-9
lines changed

1 file changed

+30
-9
lines changed

docs/vendor/embedded-using.mdx

Lines changed: 30 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -235,18 +235,39 @@ This section outlines some additional use cases for Embedded Cluster. These are
235235

236236
### NVIDIA GPU Operator
237237

238-
The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPUs. For more information about this operator, see the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html) documentation. You can include the operator in your release as an additional Helm chart, or using the Embedded Cluster Helm extensions. For information about Helm extensions, see [extensions](/reference/embedded-config#extensions) in _Embedded Cluster Config_.
238+
The NVIDIA GPU Operator uses the operator framework within Kubernetes to automate the management of all NVIDIA software components needed to provision GPUs. For more information about this operator, see the [NVIDIA GPU Operator](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/overview.html) documentation.
239239

240-
Using this operator with Embedded Cluster requires configuring the containerd options in the operator as follows:
240+
You can include the NVIDIA GPU Operator in your release as an additional Helm chart, or using Embedded Cluster Helm extensions. For information about adding Helm extensions, see [extensions](/reference/embedded-config#extensions) in _Embedded Cluster Config_.
241+
242+
Using the NVIDIA GPU Operator with Embedded Cluster requires configuring the containerd options in the operator as follows:
241243

242244
```yaml
243-
toolkit:
244-
env:
245-
- name: CONTAINERD_CONFIG
246-
value: /etc/k0s/containerd.d/nvidia.toml
247-
- name: CONTAINERD_SOCKET
248-
value: /run/k0s/containerd.sock
249-
```
245+
# Embedded Cluster Config
246+
247+
extensions:
248+
helm:
249+
repositories:
250+
- name: nvidia
251+
url: https://nvidia.github.io/gpu-operator
252+
charts:
253+
- name: gpu-operator
254+
chartname: nvidia/gpu-operator
255+
namespace: gpu-operator
256+
version: "v24.9.1"
257+
values: |
258+
# configure the containerd options
259+
toolkit:
260+
env:
261+
- name: CONTAINERD_CONFIG
262+
value: /etc/k0s/containerd.d/nvidia.toml
263+
- name: CONTAINERD_SOCKET
264+
value: /run/k0s/containerd.sock
265+
```
266+
When the containerd options are configured as shown above, the NVIDIA GPU Operator automatically creates the required configurations in the `/etc/k0s/containerd.d/nvidia.toml` file. It is not necessary to create this file manually, or modify any other configuration on the hosts.
267+
268+
:::note
269+
If the host has an existing containerd service running (which might have been installed by Docker) the install will fail. Remove any existing containerd services.
270+
:::
250271

251272
## Troubleshoot with Support Bundles
252273

0 commit comments

Comments
 (0)