Skip to content

ucx-cuda DEB package Recommends pulls in mismatched NVIDIA driver libraries, breaking existing GPU environments #11257

@amahussein

Description

@amahussein

Describe the bug

Starting with UCX 1.20.0, the ucx-cuda DEB package declares Recommends: libnvidia-compute | libnvidia-ml1. Since apt installs Recommends by default, this causes NVIDIA driver userspace libraries to be pulled in automatically when installing UCX — even in environments that already have a working GPU driver.

When the version of the recommended libnvidia-compute package (resolved from the apt repository) does not match the kernel driver already installed on the host, this results in a driver/library version mismatch that breaks GPU functionality:

$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
NVML library version: 595.45

In UCX 1.19.x and earlier, the ucx-cuda package had no Recommends field, so installing UCX was harmless to the system's existing driver setup.

Steps to Reproduce

  1. Start with a system or container that has a working NVIDIA GPU driver (e.g., kernel driver 590.44.01)
  2. Install UCX 1.20.0 DEB packages:
    wget https://github.com/openucx/ucx/releases/download/v1.20.0/ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
    tar -xvf ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
    apt install -y *.deb
  3. Observe that apt automatically installs additional NVIDIA packages as recommended dependencies:
    The following additional packages will be installed:
      libnvidia-cfg1 libnvidia-common libnvidia-compute libnvidia-decode
      libnvidia-gpucomp nvidia-persistenced
    
  4. Run nvidia-smi — it fails with Driver/library version mismatch

Expected behavior

UCX should not pull in driver packages, even as soft dependencies. UCX uses the CUDA Driver API via forward-compatible libcuda.so, which is designed to work across driver versions. The driver is a system-level component managed independently of UCX.

Setup and versions

  • UCX version: 1.20.0
  • OS: Ubuntu 22.04
  • Package: ucx-1.20.0-ubuntu22.04-mofed5-cuda12-x86_64.tar.bz2
  • Host driver: 590.44.01
  • Pulled driver: 595.45.04 (from NVIDIA CUDA apt repository)

Additional information

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions