You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Detect vendor before crafting cdiDeviceIDs for --gpus
This detects the GPU vendor from the CDI spec files while generating devicesIDs corresponding to the values passed to --gpus option. With this, the users can also use AMD gpus if a corresponding CDI spec is present.
Signed-off-by: Shiv Tyagi <Shiv.Tyagi@amd.com>
nerdctl provides docker-compatible NVIDIA and AMD GPU support.
11
11
12
12
## Prerequisites
13
13
14
-
- NVIDIA Drivers
15
-
- Same requirement as when you use GPUs on Docker. For details, please refer to [the doc by NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites).
16
-
- The NVIDIA Container Toolkit
17
-
- containerd relies on the NVIDIA Container Toolkit to make GPUs usable inside a container. You can install the NVIDIA Container Toolkit by following the [official installation instructions](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html).
14
+
- GPU Drivers
15
+
- Same requirement as when you use GPUs on Docker. For details, please refer to these docs by [NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#pre-requisites) and [AMD](https://instinct.docs.amd.com/projects/container-toolkit/en/latest/container-runtime/quick-start-guide.html#step-2-install-the-amdgpu-driver).
16
+
- Container Toolkit
17
+
- containerd relies on vendor Container Toolkits to make GPUs available to the containers. You can install those by following the official installation instructions from [NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) and [AMD](https://instinct.docs.amd.com/projects/container-toolkit/en/latest/container-runtime/quick-start-guide.html).
18
+
- CDI Specification
19
+
- Container Device Interface (CDI) specification for the GPU devices is required for the GPU support to work. Follow the official documentation from [NVIDIA](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) and [AMD](https://instinct.docs.amd.com/projects/container-toolkit/en/latest/container-runtime/cdi-guide.html) to ensure that the required CDI specifications are present on the system.
18
20
19
21
## Options for `nerdctl run --gpus`
20
22
21
23
`nerdctl run --gpus` is compatible to [`docker run --gpus`](https://docs.docker.com/engine/reference/commandline/run/#access-an-nvidia-gpu).
22
24
23
25
You can specify number of GPUs to use via `--gpus` option.
24
-
The following example exposes all available GPUs.
26
+
The following examples expose all available GPUs to the container.
25
27
26
28
```
27
29
nerdctl run -it --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
28
30
```
29
31
32
+
or
33
+
34
+
```
35
+
nerdctl run -it --rm --gpus=all rocm/rocm-terminal rocm-smi
36
+
```
37
+
30
38
You can also pass detailed configuration to `--gpus` option as a list of key-value pairs. The following options are provided.
31
39
32
40
-`count`: number of GPUs to use. `all` exposes all available GPUs.
33
-
-`device`: IDs of GPUs to use. UUID or numbers of GPUs can be specified.
41
+
-`device`: IDs of GPUs to use. UUID or numbers of GPUs can be specified. This only works for NVIDIA GPUs.
34
42
35
-
The following example exposes a specific GPU to the container.
43
+
The following example exposes a specific NVIDIA GPU to the container.
36
44
37
45
```
38
46
nerdctl run -it --rm --gpus 'device=GPU-3a23c669-1f69-c64e-cf85-44e9b07e7a2a' nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
@@ -72,17 +80,17 @@ services:
72
80
73
81
### `nerdctl run --gpus` fails due to an unresolvable CDI device
74
82
75
-
If the required CDI specifications for NVIDIA devices are not available on the
83
+
If the required CDI specifications for your GPU devices are not available on the
76
84
system, the `nerdctl run` command will fail with an error similar to: `CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all` (the
77
-
exact error message will depend on the device(s) requested).
85
+
exact error message will depend on the vendor and the device(s) requested).
78
86
79
87
This should be the same error message that is reported when the `--device` flag
80
88
is used to request a CDI device:
81
89
```
82
90
nerdctl run --device=nvidia.com/gpu=all
83
91
```
84
92
85
-
Ensure that the NVIDIA Container Toolkit (>= v1.18.0 is recommended) is installed and the requested CDI devices are present in the ouptut of `nvidia-ctk cdi list`:
93
+
Ensure that the NVIDIA (or AMD) Container Toolkit is installed and the requested CDI devices are present in the ouptut of `nvidia-ctk cdi list` (or `amd-ctk cdi list` for AMD GPUs):
See the NVIDIA Container Toolkit [CDI documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) for more information.
103
+
For NVIDIA Container Toolkit, version >= v1.18.0 is recommended. See the NVIDIA Container Toolkit [CDI documentation](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) for more information.
104
+
105
+
For AMD Container Toolkit, version >= v1.2.0 is recommended. See the AMD Container Toolkit [CDI documentation](https://instinct.docs.amd.com/projects/container-toolkit/en/latest/container-runtime/cdi-guide.html) for more information.
96
106
97
107
98
108
### `nerdctl run --gpus` fails when using the Nvidia gpu-operator
0 commit comments