Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

Perf on RTX 4070:

before:
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 460 runs -  2174.26 us/run -  60.13 GFLOP/run -  27.66 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 288 runs -  3482.95 us/run -  60.13 GFLOP/run -  17.26 TFLOPS
  
after:
  MUL_MAT(type_a=iq1_s,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 726 runs -  1379.33 us/run -  60.13 GFLOP/run -  43.59 TFLOPS
  MUL_MAT(type_a=iq1_m,type_b=f32,m=4096,n=512,k=14336,bs=[1,1],nr=[1,1],per=[0,1,2,3]):                 412 runs -  2428.05 us/run -  60.13 GFLOP/run -  24.76 TFLOPS

@jeffbolznv jeffbolznv requested a review from 0cc4m March 17, 2025 13:22
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 17, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Mar 19, 2025

Master:

model size params backend ngl test t/s
llama 8B IQ1_S - 1.5625 bpw 1.87 GiB 8.03 B Vulkan 99 pp512 2609.84 ± 16.94
llama 8B IQ1_S - 1.5625 bpw 1.87 GiB 8.03 B Vulkan 99 tg128 65.52 ± 0.36

PR:

model size params backend ngl test t/s
llama 8B IQ1_S - 1.5625 bpw 1.87 GiB 8.03 B Vulkan 99 pp512 3461.18 ± 47.70
llama 8B IQ1_S - 1.5625 bpw 1.87 GiB 8.03 B Vulkan 99 tg128 66.59 ± 1.57

@0cc4m 0cc4m merged commit a9b5928 into ggml-org:master Mar 19, 2025
43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants