ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9

 System Information:
   * OS: Windows
   * GPU: NVIDIA GeForce RTX 5060 Ti
   * NVIDIA Driver Version: 577.00
   * CUDA Version (from `nvidia-smi`): 12.9
   * Python Version: 3.12
   * Visual Studio: Visual Studio 2019 with "Desktop development with C++" workload

  Problem Description:
  I am unable to get llama-cpp-python to use my GPU. When I run a script to load a model with n_gpu_layers=-1, I get the error ggml_cuda_init: failed to initialize CUDA: (null), and all layers are
  loaded on the CPU.

  Troubleshooting Steps Taken:
   1. Installed llama-cpp-python using the following command in the "x64 Native Tools Command Prompt for VS 2019" with a Python virtual environment activated:
   1     set CMAKE_ARGS="-DGGML_CUDA=on" && pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
   2. Verified that the command completes successfully, but the resulting installation does not use the GPU.
   3. Tried using the deprecated LLAMA_CUBLAS flag, which resulted in a build error (as expected).
   4. Performed a full cleanup of the environment:
       * pip uninstall llama-cpp-python
       * pip cache purge
       * Manually deleted leftover ~* directories from site-packages.
   5. Reinstalled after the cleanup, but the problem persists.
   6. Installed PyTorch with CUDA 12.1 support (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121) before reinstalling llama-cpp-python, but this did not
      resolve the issue.
   7. Confirmed that the correct Python interpreter and virtual environment are being used.
   8. The run_with_llama_cpp.py script being used is:

    1     from llama_cpp import Llama
    2 
    3     llm = Llama(
    4       model_path="models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
    5       n_gpu_layers=-1,
    6       n_ctx=4096,
    7       verbose=True
    8     )
    9 
   10     output = llm(
   11       "AI is going to ",
   12       max_tokens=32,
   13       stop=["."],
   14       echo=True
   15     )
   16 
   17     print(output)

  Request:
  Could you please provide any insights into why the CUDA initialization might be failing, or suggest any further diagnostic steps? I can provide the full verbose build log if needed.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9 #2062

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9 #2062

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions