Skip to content

Commit 64f6c2d

Browse files
ikawrakowIwan Kawrakow
andauthored
Much faster prompt processing for k-quants (ARM_NEON) (#552)
* iq2_xxs 55.8 -> 167.5 t/s. iq2_xxs is at 93.7 t/s * iq2_xs 46.4 -> 166.6 t/s. iq2_xs_r4 is at 72.3 t/s. * iq2_s 42.8 t/s -> 166.8 t/s. iq2_s_r4 is at 71.1 t/s. * iq3_xxs 51.8 t/s -> 165.6 t/s. iq3_xxs_r4 is at 84.6 t/s. * iq3_s 46.0 t/s -> 162.0 t/s. iq3_s_r4 is at 79.4 t/s * q2_k 85.7 t/s -> 168.1 t/s. q2_k_r4 is at 111.2 t/s. * q3_K 45.7 t/s -> 170.8 t/s. q3_k_r4 is at 110.3 t/s. * q6_k 47.7 t/s -> 124 t/s. q6_k_r4 is at 112.7 t/s. * q4_k 58.2 t/s -> 114.8 t/s. iq4_k_r4 is at 130.9 t/s. As I had to add a new implementation for q8_1-quantized activations, TG became slightly faster too (25.1 -> 25.9 t/s). * q5_k 54.9 -> 114.9 t/s. q5_k_r4 is at 116.2 t/s. * iq4_xs 71.2 -> 167.8 t/s. iq4_xs_r4 is at 138.6 t/s. --------- Co-authored-by: Iwan Kawrakow <[email protected]>
1 parent ddda4d9 commit 64f6c2d

File tree

3 files changed

+725
-18
lines changed

3 files changed

+725
-18
lines changed

ggml/src/ggml.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -979,7 +979,7 @@ static const ggml_type_traits_t type_traits[GGML_TYPE_COUNT] = {
979979
#ifdef __AVX2__
980980
.vec_dot_type = GGML_TYPE_Q8_2_X4,
981981
#else
982-
.vec_dot_type = GGML_TYPE_Q8_K,
982+
.vec_dot_type = GGML_TYPE_Q8_1_X4,
983983
#endif
984984
.nrows = 1,
985985
.row_meta_size = 0,
@@ -1009,7 +1009,7 @@ static const ggml_type_traits_t type_traits[GGML_TYPE_COUNT] = {
10091009
#ifdef __AVX2__
10101010
.vec_dot_type = GGML_TYPE_Q8_2_X4,
10111011
#else
1012-
.vec_dot_type = GGML_TYPE_Q8_K,
1012+
.vec_dot_type = GGML_TYPE_Q8_1_X4,
10131013
#endif
10141014
.nrows = 1,
10151015
.row_meta_size = 0,
@@ -1039,7 +1039,7 @@ static const ggml_type_traits_t type_traits[GGML_TYPE_COUNT] = {
10391039
#ifdef __AVX2__
10401040
.vec_dot_type = GGML_TYPE_Q8_2_X4,
10411041
#else
1042-
.vec_dot_type = GGML_TYPE_Q8_K,
1042+
.vec_dot_type = GGML_TYPE_Q8_0_X4,
10431043
#endif
10441044
// .vec_dot_type = GGML_TYPE_Q8_K,
10451045
.nrows = 1,

0 commit comments

Comments
 (0)