Skip to content

Conversation

@0cc4m
Copy link
Collaborator

@0cc4m 0cc4m commented Apr 5, 2025

Retune the DP4A matmul shaders I added in #12135 to extract more performance from them after a bugfix reduced their performance in #12722. Here are my test results:

RTX 3090:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s coopmat1 t/s coopmat2 t/s CUDA
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 1025.95 ± 1.11 1928.35 ± 8.34 1925.12 ± 6.65 3138.60 ± 28.29 4247.01 ± 60.18 5069.38 ± 18.59
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 1000.95 ± 2.89 1898.78 ± 4.69 1894.99 ± 7.38 2749.46 ± 3.65 4329.58 ± 16.31 4932.49 ± 14.11

AMD Radeon RX 6800 XT:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s ROCm
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 922.14 ± 1.04 1284.95 ± 2.76 1463.92 ± 1.74 1678.14 ± 2.28
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 902.82 ± 0.80 1060.93 ± 0.74 1250.73 ± 1.23 1618.84 ± 1.25

AMD Radeon Pro VII:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s ROCm
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 313.00 ± 0.60 390.98 ± 1.52 588.09 ± 0.42 1012.39 ± 0.46
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 309.01 ± 0.48 317.19 ± 0.91 501.33 ± 0.55 398.53 ± 0.06

Intel A770:

model size params backend ngl test t/s fp16 t/s int dot master t/s int dot PR t/s SYCL
llama 8B Q4_0 4.33 GiB 8.03 B Vulkan 99 pp512 165.47 ± 0.14 508.05 ± 0.95 734.05 ± 1.60 917.26 ± 6.47
llama 8B Q8_0 7.95 GiB 8.03 B Vulkan 99 pp512 157.58 ± 0.17 493.80 ± 0.76 678.78 ± 0.75 893.41 ± 4.35

@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Apr 5, 2025
@0cc4m 0cc4m requested a review from jeffbolznv April 5, 2025 07:34
LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Apr 5, 2025
@0cc4m 0cc4m merged commit 6bf28f0 into master Apr 5, 2025
51 checks passed
@0cc4m 0cc4m deleted the 0cc4m/vulkan-mmq-dp4a-tune branch April 5, 2025 16:04
colout pushed a commit to colout/llama.cpp that referenced this pull request Apr 29, 2025
timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants