Skip to content

Misc. bug: Flash attention on Vulkan #12526

@Nindaleth

Description

@Nindaleth

Name and Version

$ ./build/bin/llama-cli --version
version: 4941 (ba932df)
built with cc (GCC) 14.2.1 20250110 (Red Hat 14.2.1-7) for x86_64-redhat-linux

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

Other (Please specify in the next section)

Command line

./build/bin/llama-bench -ngl 99 -m models/Qwen2.5-Coder-14B-Instruct-Q4_K_L.gguf -fa 0,1
./build/bin/llama-bench -ngl 99 -m models/Qwen2.5-Coder-1.5B-Instruct-Q8_0.gguf -fa 0,1

Problem description & steps to reproduce

It seems that some FA operations are not yet handled by the Vulkan backend and fall back to CPU. But I can't find any open issue on this, only several closed ones, so maybe it's just some model archs or just my GPU that can't do it yet?

Using Mesa RADV 25.0.1 on Linux Fedora 41. AMD Radeon RX 6700 XT, gfx1030 (0x1030), VMM: no, Wave Size: 32

model size params backend ngl fa test t/s
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B ROCm 99 0 pp512 408.32 ± 0.13
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B ROCm 99 0 tg128 27.97 ± 0.01
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B ROCm 99 1 pp512 352.63 ± 3.31
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B ROCm 99 1 tg128 26.74 ± 0.01
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B Vulkan 99 0 pp512 256.50 ± 0.14
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B Vulkan 99 0 tg128 34.37 ± 0.09
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B Vulkan 99 1 pp512 100.24 ± 1.31
qwen2 14B Q4_K - Medium 8.90 GiB 14.77 B Vulkan 99 1 tg128 22.07 ± 0.03

build: ba932df (4941)

Seems that perf loss is more pronounced with smaller models:

model size params backend ngl fa test t/s
qwen2 1.5B Q8_0 1.53 GiB 1.54 B ROCm 99 0 pp512 4682.96 ± 2.85
qwen2 1.5B Q8_0 1.53 GiB 1.54 B ROCm 99 0 tg128 113.17 ± 0.03
qwen2 1.5B Q8_0 1.53 GiB 1.54 B ROCm 99 1 pp512 3458.94 ± 3.00
qwen2 1.5B Q8_0 1.53 GiB 1.54 B ROCm 99 1 tg128 100.59 ± 0.22
qwen2 1.5B Q8_0 1.53 GiB 1.54 B Vulkan 99 0 pp512 2778.95 ± 4.85
qwen2 1.5B Q8_0 1.53 GiB 1.54 B Vulkan 99 0 tg128 140.45 ± 0.39
qwen2 1.5B Q8_0 1.53 GiB 1.54 B Vulkan 99 1 pp512 685.72 ± 5.62
qwen2 1.5B Q8_0 1.53 GiB 1.54 B Vulkan 99 1 tg128 42.96 ± 0.83

build: ba932df (4941)

First Bad Commit

No response

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions