Skip to content

Commit 4236e5d

Browse files
author
Iwan Kawrakow
committed
q8_KV_r8: don't use nrc_y = 16 on Zen4
This is faster - 350 t/s. Why? Much better than the 290 t/s we had before, but still slower than the 370 t/s for q8_k_r8.
1 parent 0526db2 commit 4236e5d

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

ggml/src/iqk/iqk_mul_mat.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9367,9 +9367,9 @@ bool MulMat::prepare(int typeA, int typeB, int ne00, MulMat& mm, int Ny) {
93679367
mm.funcs[5] = mul_mat_q8_KV_r8_q8_KV<6>;
93689368
mm.funcs[6] = mul_mat_q8_KV_r8_q8_KV<7>;
93699369
mm.funcs[7] = mul_mat_q8_KV_r8_q8_KV<8>;
9370-
#ifdef HAVE_FANCY_SIMD
9371-
mm.func16 = mul_mat_q8_KV_r8_q8_KV<16>;
9372-
#endif
9370+
//#ifdef HAVE_FANCY_SIMD
9371+
// mm.func16 = mul_mat_q8_KV_r8_q8_KV<16>;
9372+
//#endif
93739373
expected_typeB = GGML_TYPE_Q8_KV;
93749374
break;
93759375
case GGML_TYPE_IQ4_K_R4:

0 commit comments

Comments
 (0)