Skip to content

fix: add /usr/bin symlinks for nvidia-ctk and nvidia-cdi-hook#1019

Closed
ormandj wants to merge 1 commit intosiderolabs:mainfrom
ormandj:fix/nvidia-cdi-hook-symlinks
Closed

fix: add /usr/bin symlinks for nvidia-ctk and nvidia-cdi-hook#1019
ormandj wants to merge 1 commit intosiderolabs:mainfrom
ormandj:fix/nvidia-cdi-hook-symlinks

Conversation

@ormandj
Copy link

@ormandj ormandj commented Mar 21, 2026

What? (description)

Adds symlinks from /usr/bin/nvidia-ctk/usr/local/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook/usr/local/bin/nvidia-cdi-hook in the nvidia-container-toolkit extension build.

Why? (reasoning)

The gpu-operator device plugin generates CDI specs with hooks pointing to /usr/bin/nvidia-ctk (DefaultNvidiaCTKPath) and /usr/bin/nvidia-cdi-hook (defaultNvidiaCDIHookPath). Talos extensions install these binaries under /usr/local/bin/, so pods requesting nvidia.com/gpu resource limits fail with a "no such file" error.

This follows the exact pattern of the existing /usr/bin/ldconfig symlink. With these symlinks, the device plugin's default CDI hook paths resolve correctly and users no longer need to set NVIDIA_CDI_HOOK_PATH in their gpu-operator values.

Depends on: siderolabs/talos#13022 (validator allowlist update to permit /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook)

Ref: siderolabs/talos#13021

The gpu-operator device plugin generates CDI specs with hooks pointing
to /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook (hardcoded defaults
in NVIDIA/k8s-device-plugin and NVIDIA/nvidia-container-toolkit). Talos
extensions install these binaries under /usr/local/bin/, causing pods
requesting nvidia.com/gpu resource limits to fail.

Add symlinks from /usr/bin/nvidia-ctk and /usr/bin/nvidia-cdi-hook to
their /usr/local/bin/ counterparts, following the same pattern as the
existing /usr/bin/ldconfig symlink. This eliminates the need for users
to set NVIDIA_CDI_HOOK_PATH in the gpu-operator values.

Requires siderolabs/talos#13021 validator allowlist update.

Signed-off-by: David Orman <ormandj@corenode.com>
@ormandj ormandj force-pushed the fix/nvidia-cdi-hook-symlinks branch from 6e2c82d to 72e89c3 Compare March 21, 2026 18:57
@smira
Copy link
Member

smira commented Mar 23, 2026

Thank you, went with symlink approach in siderolabs/talos#13022

@smira smira closed this Mar 23, 2026
@github-project-automation github-project-automation bot moved this from In Review to Done in Planning Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants