-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Open
Labels
Description
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 CUDA devices:
Device 0: NVIDIA GeForce RTX 4090, compute capability 8.9, VMM: yes
Device 1: NVIDIA GeForce RTX 5090, compute capability 12.0, VMM: yes
version: 6098 (2241453)
built with cc (Debian 12.2.0-14) 12.2.0 for x86_64-linux-gnu
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --model models/DeepSeek-R1-Distill-Llama-8B-GGUF/DeepSeek-R1-Distill-Llama-8B-F16.gguf -ngl 70 -mg 1 --device CUDA1,CUDA0 -ts 2,1 -b 1 -c 60000 --host 0.0.0.0 --port 11435 --threads 32 --mlock --numa numactl --cont-batching --flash-attn
Problem description & steps to reproduce
There was a commit to vscode copilot chat that prevents llama cpp usage with it. https://github.com/microsoft/vscode-copilot-chat/commit/0dd4ce55a75c68bb2a8b3d96ff345db871e0a418

connected to this PR #12896
First Bad Commit
No response