Skip to content

ggml_cuda_init: failed to initialize CUDA: (null) on Windows with CUDA 12.9 #2062

@sequeirawilson2021

Description

@sequeirawilson2021

System Information:

  • OS: Windows
  • GPU: NVIDIA GeForce RTX 5060 Ti
  • NVIDIA Driver Version: 577.00
  • CUDA Version (from nvidia-smi): 12.9
  • Python Version: 3.12
  • Visual Studio: Visual Studio 2019 with "Desktop development with C++" workload

Problem Description:
I am unable to get llama-cpp-python to use my GPU. When I run a script to load a model with n_gpu_layers=-1, I get the error ggml_cuda_init: failed to initialize CUDA: (null), and all layers are
loaded on the CPU.

Troubleshooting Steps Taken:

  1. Installed llama-cpp-python using the following command in the "x64 Native Tools Command Prompt for VS 2019" with a Python virtual environment activated:
    1 set CMAKE_ARGS="-DGGML_CUDA=on" && pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
  2. Verified that the command completes successfully, but the resulting installation does not use the GPU.
  3. Tried using the deprecated LLAMA_CUBLAS flag, which resulted in a build error (as expected).
  4. Performed a full cleanup of the environment:
    • pip uninstall llama-cpp-python
    • pip cache purge
    • Manually deleted leftover ~* directories from site-packages.
  5. Reinstalled after the cleanup, but the problem persists.
  6. Installed PyTorch with CUDA 12.1 support (pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121) before reinstalling llama-cpp-python, but this did not
    resolve the issue.
  7. Confirmed that the correct Python interpreter and virtual environment are being used.
  8. The run_with_llama_cpp.py script being used is:
1     from llama_cpp import Llama
2 
3     llm = Llama(
4       model_path="models/mistral-7b-instruct-v0.2.Q4_K_M.gguf",
5       n_gpu_layers=-1,
6       n_ctx=4096,
7       verbose=True
8     )
9 

10 output = llm(
11 "AI is going to ",
12 max_tokens=32,
13 stop=["."],
14 echo=True
15 )
16
17 print(output)

Request:
Could you please provide any insights into why the CUDA initialization might be failing, or suggest any further diagnostic steps? I can provide the full verbose build log if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions