Skip to content

Conversation

0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Aug 25, 2025

This fixes a bug in the selection of the subgroup mmid optimization introduced in #15524.

@MrLavender please give it a try, it should be working for you with this fix.

@0cc4m 0cc4m requested a review from jeffbolznv August 25, 2025 13:43
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Aug 25, 2025
@MrLavender
Copy link

Yes that works, thank you! :)

This is a fantastic win because with flash attention the ROCm backend is now only better than Vulkan at small pp sizes where the difference doesn't really matter, and ROCm performance degrades very quickly as pp size increases.

llama-bench -fa 0,1 -p 512,1024,2048,4096 -m gpt-oss-20b-mxfp4.gguf 

Vulkan

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp512 976.97 ± 8.17
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp1024 969.50 ± 1.57
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp2048 910.61 ± 1.93
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 pp4096 849.46 ± 2.31
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 0 tg128 125.82 ± 0.05
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp512 980.48 ± 4.63
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp1024 975.67 ± 7.09
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp2048 961.13 ± 3.07
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 pp4096 929.77 ± 3.94
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B Vulkan 99 1 tg128 123.70 ± 0.10

ROCm

model size params backend ngl fa test t/s
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp512 1712.99 ± 14.24
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp1024 1652.80 ± 4.07
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp2048 1537.22 ± 4.35
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 pp4096 1376.64 ± 2.84
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 0 tg128 95.96 ± 0.19
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp512 1244.60 ± 3.71
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp1024 1078.54 ± 3.17
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp2048 884.51 ± 2.27
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 pp4096 640.66 ± 0.59
gpt-oss 20B MXFP4 MoE 11.27 GiB 20.91 B ROCm 99 1 tg128 93.37 ± 0.25

@0cc4m 0cc4m merged commit 4d917cd into master Aug 25, 2025
48 checks passed
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 26, 2025
Minh141120 pushed a commit to menloresearch/llama.cpp that referenced this pull request Aug 27, 2025
@0cc4m 0cc4m deleted the 0cc4m/vulkan-mmid-subgroup-fix branch August 31, 2025 09:16
rasbid pushed a commit to rasbid/llama.cpp that referenced this pull request Oct 11, 2025
- Make subgroup_min_size_16 condition less restrictive for GCN (subgroup_max_size >= 8)
- Add GCN-specific pipeline configurations with 64 subgroup sizes
- Enable more aggressive subgroup usage for GCN architecture
- Target: orders of magnitude performance improvement like PR ggml-org#15565
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants