You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: container-toolkit/cdi-support.md
+62-76Lines changed: 62 additions & 76 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,32 +34,46 @@ CDI also improves the compatibility of the NVIDIA container stack with certain f
34
34
35
35
As of NVIDIA Container Toolkit `v1.18.0`, the CDI specification is automatically generated and updated by a systemd service called `nvidia-cdi-refresh`. This service:
36
36
37
-
- Automatically generates the CDI specification at `/var/run/cdi/nvidia.yaml` when NVIDIA drivers are installed or upgraded
38
-
- Runs automatically on system boot to ensure the specification is up to date
37
+
- Automatically generates the CDI specification at `/var/run/cdi/nvidia.yaml` when:
38
+
- The NVIDIA Container Toolkit is installed or upgraded
39
+
- The NVIDIA GPU drivers are installed or upgraded
40
+
- The system is rebooted
39
41
40
-
```{note}
41
-
The automatic CDI refresh service does not handle:
42
-
- Driver removal (the CDI file is intentionally preserved)
43
-
- MIG device reconfiguration
42
+
This ensures that the CDI specifications are up to date for the current driver
43
+
and device configuration and that CDI Devices defined in these speciciations are
44
+
available when using native CDI support in container engines such as Docker or Podman.
44
45
45
-
For these scenarios, you may still need to manually regenerate the CDI specification. See [Manual CDI Specification Generation](#manual-cdi-specification-generation) for instructions.
46
+
Running the following command will give a list of availble CDI Devices:
47
+
```console
48
+
nvidia-ctk cdi list
46
49
```
47
50
48
-
#### Customizing the Automatic CDI Refresh Service
51
+
#### Known limitations
52
+
The `nvidia-cdi-refresh` service does not currently handle the following situations:
53
+
54
+
- The removal of NVIDIA GPU drivers
55
+
- The reconfiguration of MIG devices
56
+
57
+
For these scenarios, the regeneration of CDI specifications must be [manually triggered](#manual-cdi-specification-generation).
49
58
50
-
You can customize the behavior of the `nvidia-cdi-refresh` service by adding environment variables to `/etc/nvidia-container-toolkit/cdi-refresh.env`. This file is read by the service and allows you to modify the `nvidia-ctk cdi generate` command behavior.
59
+
#### Customizing the Automatic CDI Refresh Service
60
+
The behavior of the `nvidia-cdi-refresh` service can be customized by adding
61
+
environment variables to `/etc/nvidia-container-toolkit/cdi-refresh.env` to
62
+
affect the behavior of the `nvidia-ctk cdi generate` command.
51
63
52
-
Example configuration file:
64
+
As an example, to enable debug logging the configuration file should be updated
65
+
as follows:
53
66
```bash
54
67
# /etc/nvidia-container-toolkit/cdi-refresh.env
55
68
NVIDIA_CTK_DEBUG=1
56
-
# Add other nvidia-ctk environment variables as needed
57
69
```
58
70
59
71
For a complete list of available environment variables, run `nvidia-ctk cdi generate --help` to see the command's documentation.
60
72
61
73
```{important}
62
-
After modifying the environment file, you must reload the systemd daemon and restart the service for changes to take effect:
74
+
Modifications to the environment file required a systemd reload and restarting the
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Selecting /usr/bin/nvidia-smi as /usr/bin/nvidia-smi"
96
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Selecting /usr/bin/nvidia-debugdump as /usr/bin/nvidia-debugdump"
97
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Selecting /usr/bin/nvidia-persistenced as /usr/bin/nvidia-persistenced"
98
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Selecting /usr/bin/nvidia-cuda-mps-control as /usr/bin/nvidia-cuda-mps-control"
99
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Selecting /usr/bin/nvidia-cuda-mps-server as /usr/bin/nvidia-cuda-mps-server"
100
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=warning msg="Could not locate nvidia-imex: pattern nvidia-imex not found"
101
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=warning msg="Could not locate nvidia-imex-ctl: pattern nvidia-imex-ctl not found"
102
-
Jun 27 00:04:30 ipp2-0502 nvidia-ctk[1623461]: time="2025-06-27T00:04:30-04:00" level=info msg="Generated CDI spec with version 1.0.0"
103
-
Jun 27 00:04:30 ipp2-0502 systemd[1]: nvidia-cdi-refresh.service: Succeeded.
104
-
Jun 27 00:04:30 ipp2-0502 systemd[1]: Started Refresh NVIDIA CDI specification file.
113
+
...
105
114
```
106
115
107
-
You can enable/disable the automatic CDI refresh service using the following commands:
116
+
If these are not enabled as expected, they can be enabled by running:
You can also view the service logs to see the output of the CDI generation process.
123
+
#### Troubleshooting CDI Specification Generation and Resolution
124
+
125
+
If CDI specifications for available devices are not generated / updated as expected, it is
126
+
recommended that the logs for the `nvidia-cdi-refresh.service` be checked. This can be
127
+
done by running:
117
128
118
129
```console
119
-
# View service logs
120
130
$ sudo journalctl -u nvidia-cdi-refresh.service
121
131
```
122
132
123
-
### Manual CDI Specification Generation
124
-
125
-
If you need to manually generate a CDI specification, for example, after MIG configuration changes or if you are using a Container Toolkit version before v1.18.0, follow this procedure:
126
-
127
-
Two common locations for CDI specifications are `/etc/cdi/` and `/var/run/cdi/`.
128
-
The contents of the `/var/run/cdi/` directory are cleared on boot.
129
-
130
-
However, the path to create and use can depend on the container engine that you use.
131
-
132
-
1. Generate the CDI specification file:
133
+
In most cases, restarting the service should be sufficient to trigger the (re)generation
The following example output is for a machine with a single GPU that does not support MIG.
142
+
```console
143
+
$ nvidia-ctk --debug cdi list
144
+
```
145
+
will show a list of available CDI Devices as well as any errors that may have
146
+
occurred when loading CDI Specifications from `/etc/cdi` or `/var/run/cdi`.
159
147
160
-
```output
161
-
INFO[0000] Found 9 CDI devices
162
-
nvidia.com/gpu=all
163
-
nvidia.com/gpu=0
164
-
```
148
+
### Manual CDI Specification Generation
165
149
166
-
```{important}
167
-
You must generate a new CDI specification after any of the following changes:
150
+
As of the NVIDIA Container Toolkit `v1.18.0` the recommended mechanism to regenerate CDI specifications is to restart the `nvidia-cdi-refresh.service`:
168
151
169
-
- You change the device or CUDA driver configuration.
170
-
- You use a location such as `/var/run/cdi` that is cleared on boot.
0 commit comments