Skip to content

running error - initialization error: nvml error: driver/library version mismatch: unknown. #48

@helxsz

Description

@helxsz

This is my host info with Tesla T4 GPU

infrastrcture:                              x86_64
CPU :                      32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
CPU:                                48
 ID:                           GenuineIntel
Name:                          Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz

Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Mitigation; TSX disabled

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.5.2
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.8.4.1
[pip3] nvidia-cuda-cupti-cu12==12.8.90
[pip3] nvidia-cuda-nvrtc-cu12==12.8.93
[pip3] nvidia-cuda-runtime-cu12==12.8.90
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cudnn-frontend==1.16.0
[pip3] nvidia-cufft-cu12==11.3.3.83
[pip3] nvidia-cufile-cu12==1.13.1.3
[pip3] nvidia-curand-cu12==10.3.9.90
[pip3] nvidia-cusolver-cu12==11.7.3.90
[pip3] nvidia-cusparse-cu12==12.5.8.93
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-cutlass-dsl==4.3.1
[pip3] nvidia-ml-py==13.580.82
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.8.93
[pip3] nvidia-nvshmem-cu12==3.3.20
[pip3] nvidia-nvtx-cu12==12.8.90
[pip3] pyzmq==27.1.0
[pip3] torch==2.9.0+cu128
[pip3] torchaudio==2.9.0+cu128
[pip3] torchvision==0.24.0+cu128
[pip3] transformers==4.57.3
[pip3] triton==3.5.0
[conda] flashinfer-python                    0.5.2            pypi_0                pypi
[conda] numpy                                2.2.6            pypi_0                pypi
[conda] nvidia-cublas-cu12                   12.8.4.1         pypi_0                pypi
[conda] nvidia-cuda-cupti-cu12               12.8.90          pypi_0                pypi
[conda] nvidia-cuda-nvrtc-cu12               12.8.93          pypi_0                pypi
[conda] nvidia-cuda-runtime-cu12             12.8.90          pypi_0                pypi
[conda] nvidia-cudnn-cu12                    9.10.2.21        pypi_0                pypi
[conda] nvidia-cudnn-frontend                1.16.0           pypi_0                pypi
[conda] nvidia-cufft-cu12                    11.3.3.83        pypi_0                pypi
[conda] nvidia-cufile-cu12                   1.13.1.3         pypi_0                pypi
[conda] nvidia-curand-cu12                   10.3.9.90        pypi_0                pypi
[conda] nvidia-cusolver-cu12                 11.7.3.90        pypi_0                pypi
[conda] nvidia-cusparse-cu12                 12.5.8.93        pypi_0                pypi
[conda] nvidia-cusparselt-cu12               0.7.1            pypi_0                pypi
[conda] nvidia-cutlass-dsl                   4.3.1            pypi_0                pypi
[conda] nvidia-ml-py                         13.580.82        pypi_0                pypi
[conda] nvidia-nccl-cu12                     2.27.5           pypi_0                pypi
[conda] nvidia-nvjitlink-cu12                12.8.93          pypi_0                pypi
[conda] nvidia-nvshmem-cu12                  3.3.20           pypi_0                pypi
[conda] nvidia-nvtx-cu12                     12.8.90          pypi_0                pypi
[conda] pyzmq                                27.1.0           pypi_0                pypi
[conda] torch                                2.9.0+cu128      pypi_0                pypi
[conda] torchaudio                           2.9.0+cu128      pypi_0                pypi
[conda] torchvision                          0.24.0+cu128     pypi_0                pypi
[conda] transformers                         4.57.3           pypi_0                pypi
[conda] triton                               3.5.0            pypi_0                pypi

==============================
     Environment Variables
==============================
LD_LIBRARY_PATH=/root/.local/lib::/usr/local/cuda-12.4/lib64
CUDA_HOME=:/usr/local/cuda-12.4
CUDA_HOME=:/usr/local/cuda-12.4
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1

the host machine is running ollama well with info

time=2025-11-30T07:24:49.679+08:00 level=INFO source=gpu.go:217 msg="looking for compatible GPUs"
time=2025-11-30T07:24:50.126+08:00 level=INFO source=types.go:130 msg="inference compute" id=GPU-cd68373a-b857-7671-860b-0d17f4c2d4cf library=cuda variant=v12 compute=7.5 driver=12.2 name="Tesla T4" total="14.6 GiB" available="13.9 GiB"

installing with docker
docker run -d --gpus all -p 1312:1312 [ghcr.io/psalias2006/gpu-hot:latest](http://ghcr.io/psalias2006/gpu-hot:latest)

gives me an error

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.

wondering how to deal with this error, is this error happening in this part of code during the nvml setup

https://github.com/psalias2006/gpu-hot/blob/main/core/monitor.py

    try:
        pynvml.nvmlInit()
        self.initialized = True
        version = pynvml.nvmlSystemGetDriverVersion()
        if isinstance(version, bytes):
            version = version.decode('utf-8')
        logger.info(f"NVML initialized - Driver: {version}")

        # Detect which GPUs need nvidia-smi (once at boot)
        self._detect_smi_gpus()

    except Exception as e:
        logger.error(f"Failed to initialize NVML: {e}")
        self.initialized = False

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions