Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k when there aren't enough workgroups to fill the shaders.

Split out from #10206.

I did a quick touch test to verify split_k helps the non-coopmat shaders as well:

before:
  MUL_MAT(type_a=f32,type_b=f32,m=128,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                    426 runs -  2600.37 us/run - 469.76 MFLOP/run - 180.65 GFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=256,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                    428 runs -  2569.83 us/run - 939.52 MFLOP/run - 365.60 GFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=384,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                    426 runs -  2579.22 us/run -   1.41 GFLOP/run - 546.40 GFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=512,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                    432 runs -  2582.09 us/run -   1.88 GFLOP/run - 727.72 GFLOPS

after:
  MUL_MAT(type_a=f32,type_b=f32,m=128,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1704 runs -   664.08 us/run - 469.76 MFLOP/run - 707.39 GFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=256,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1605 runs -   656.67 us/run - 939.52 MFLOP/run -   1.43 TFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=384,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1562 runs -   659.93 us/run -   1.41 GFLOP/run -   2.14 TFLOPS
  MUL_MAT(type_a=f32,type_b=f32,m=512,n=128,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                   1512 runs -   678.08 us/run -   1.88 GFLOP/run -   2.77 TFLOPS

Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.
@jeffbolznv jeffbolznv requested a review from 0cc4m December 3, 2024 14:52
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 3, 2024
Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Pretty good improvement even without coopmat. I should have retested it myself. But I wouldn't have thought of the vector + scalar load in one shader thing, at best I'd have created a separate vector version.

@0cc4m 0cc4m merged commit cc98896 into ggml-org:master Dec 3, 2024
43 of 44 checks passed
tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Dec 7, 2024
Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024
Use vector loads when possible in mul_mat_split_k_reduce. Use split_k
when there aren't enough workgroups to fill the shaders.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants