-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
Description
Name and Version
build: d1e2adb (6382)
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
llama-server
Command line
/root/llama-builds/llama.cpp/bin/llama-server
--metrics
--port ${PORT}
--api-key needakey
-sm none
-mg 1
-t 10
--swa-full
--model /mnt/models/unsloth/gemma-3-4b-it-qat-Q4_0.gguf
--mmproj /mnt/models/unsloth/gemma-3-4b-it-qat-mmproj-F16.gguf
--jinja
-ngl 99
--temp 1.0
--top_k 64
--top_p 0.95
--min_p 0.0
--ctx-size 32000Problem description & steps to reproduce
When I have -sm none -mg 1 specified to load a model that also comes with a separate projector file, the file loads on what would be the "0" GPU, even if there's plenty of VRAM to spare on 1.
I'd expect this to respect the -mg 1 setting and have both files load on 1.
First Bad Commit
No response