Skip to content

Commit 4776dd2

Browse files
ikawrakowIwan Kawrakow
andauthored
Much faster prompt processing for IQK quants (ARM_NEON) (#549)
* Faster GEMM fir iq2_ks, iq4_ks * iq5_ks 63.8 t/s -> 166 t/s. iq5_ks_r4 is at 107.4 t/s. But: iw5_ks_r4 TG performance is quite a bit better: 21.7 t/s vs 17.7 t/s for iq5_ks. * iq6_k 44 t/s -> 164.3 t/s. There is no iq6_k_r4 * iq5_k 46 t/s -> 167 t/s. iq5_k_r4 is at 99.5 t/s. * iq4_k 46.4 -> 167.2 t/s. iq4_k_r4 is at 115 t/s. * iq3_k 47.3 t/s -> 166.5 t/s. iq3_k_r4 is at 96.5 t/s. * iq2_k 47.4 t/s -> 167 t/s. iq2_k_r4 is at 113.3 t/s. --------- Co-authored-by: Iwan Kawrakow <[email protected]>
1 parent cac763f commit 4776dd2

File tree

3 files changed

+518
-18
lines changed

3 files changed

+518
-18
lines changed

0 commit comments

Comments
 (0)