vulkan: further optimize q5_k mul_mat_vec #10479

jeffbolznv · 2024-11-25T04:04:09Z

Do some of the logic ops in packed u32.

Perf results on RTX 4070. Note that this "phi3 3B Q4_K" model uses Q5_K maybe a third of the time.

before
  MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  51120 runs -    98.65 us/run - 117.44 MFLOP/run -   1.19 TFLOPS
| phi3 3B Q4_K - Medium          |   2.23 GiB |     3.82 B | Vulkan     | 1000 |         tg128 |        108.54  1.25 |
| llama 3B Q5_K - Medium         |   2.16 GiB |     3.21 B | Vulkan     | 1000 |         tg128 |        112.41  2.25 |

after
  MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                  60492 runs -    82.96 us/run - 117.44 MFLOP/run -   1.42 TFLOPS
| phi3 3B Q4_K - Medium          |   2.23 GiB |     3.82 B | Vulkan     | 1000 |         tg128 |        109.39  0.47 |
| llama 3B Q5_K - Medium         |   2.16 GiB |     3.21 B | Vulkan     | 1000 |         tg128 |        117.24  1.19 |

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

daniandtheweb · 2024-11-26T01:56:26Z

These changes make quite a big difference on my Radeon 5700XT.

model	size	params	backend	ngl	threads	test	branch	t/s
qwen2 7B Q5_K - Small	4.94 GiB	7.62 B	Vulkan	99	4	tg128	master	41.07 ± 0.06
qwen2 7B Q5_K - Small	4.94 GiB	7.62 B	Vulkan	99	4	tg128	PR	49.23 ± 0.42

netrunnereve · 2024-11-26T02:20:43Z

I haven't tried it with an actual model but our tests show that it's now 6% faster on a RX 570.

Master:
  MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1704 runs -  1195.05 us/run - 117.44 MFLOP/run -  98.27 GFLOPS
PR:
  MUL_MAT(type_a=q5_K,type_b=f32,m=4096,n=1,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1704 runs -  1124.40 us/run - 117.44 MFLOP/run - 104.45 GFLOPS

vulkan: further optimize q5_k mul_mat_vec

457a483

jeffbolznv requested a review from 0cc4m November 25, 2024 04:04

0cc4m approved these changes Nov 27, 2024

View reviewed changes

0cc4m merged commit 249a790 into ggml-org:master Nov 27, 2024
7 checks passed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

vulkan: further optimize q5_k mul_mat_vec (ggml-org#10479)

433e5ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: further optimize q5_k mul_mat_vec #10479

vulkan: further optimize q5_k mul_mat_vec #10479

Uh oh!

jeffbolznv commented Nov 25, 2024

Uh oh!

daniandtheweb commented Nov 26, 2024

Uh oh!

netrunnereve commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vulkan: further optimize q5_k mul_mat_vec #10479

vulkan: further optimize q5_k mul_mat_vec #10479

Uh oh!

Conversation

jeffbolznv commented Nov 25, 2024

Uh oh!

daniandtheweb commented Nov 26, 2024

Uh oh!

netrunnereve commented Nov 26, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants