-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Closed
Labels
Description
Name and Version
$ ./build/bin/llama-cli --version
version: 4941 (ba932df)
built with cc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) for x86_64-redhat-linux
Operating systems
Linux
Which llama.cpp modules do you know to be affected?
Other (Please specify in the next section)
Command line
./build/bin/llama-bench -ngl 99 -m models/Qwen2.5-Coder-14B-Instruct-Q4_K_L.gguf -fa 0,1
./build/bin/llama-bench -ngl 99 -m models/Qwen2.5-Coder-1.5B-Instruct-Q8_0.gguf -fa 0,1Problem description & steps to reproduce
It seems that some FA operations are not yet handled by the Vulkan backend and fall back to CPU. But I can't find any open issue on this, only several closed ones, so maybe it's just some model archs or just my GPU that can't do it yet?
Using Mesa RADV 25.0.1 on Linux Fedora 41. AMD Radeon RX 6700 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | ROCm | 99 | 0 | pp512 | 408.32 ± 0.13 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | ROCm | 99 | 0 | tg128 | 27.97 ± 0.01 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | ROCm | 99 | 1 | pp512 | 352.63 ± 3.31 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | ROCm | 99 | 1 | tg128 | 26.74 ± 0.01 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | Vulkan | 99 | 0 | pp512 | 256.50 ± 0.14 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | Vulkan | 99 | 0 | tg128 | 34.37 ± 0.09 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | Vulkan | 99 | 1 | pp512 | 100.24 ± 1.31 |
| qwen2 14B Q4_K - Medium | 8.90 GiB | 14.77 B | Vulkan | 99 | 1 | tg128 | 22.07 ± 0.03 |
build: ba932df (4941)
Seems that perf loss is more pronounced with smaller models:
| model | size | params | backend | ngl | fa | test | t/s |
|---|---|---|---|---|---|---|---|
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | ROCm | 99 | 0 | pp512 | 4682.96 ± 2.85 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | ROCm | 99 | 0 | tg128 | 113.17 ± 0.03 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | ROCm | 99 | 1 | pp512 | 3458.94 ± 3.00 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | ROCm | 99 | 1 | tg128 | 100.59 ± 0.22 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | Vulkan | 99 | 0 | pp512 | 2778.95 ± 4.85 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | Vulkan | 99 | 0 | tg128 | 140.45 ± 0.39 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | Vulkan | 99 | 1 | pp512 | 685.72 ± 5.62 |
| qwen2 1.5B Q8_0 | 1.53 GiB | 1.54 B | Vulkan | 99 | 1 | tg128 | 42.96 ± 0.83 |
build: ba932df (4941)
First Bad Commit
No response