Skip to content

Commit 543e8eb

Browse files
ggml : block interleaving support for Q4_K quanti for AArch64
* new quanti: block_q4_kx4 with offline repack impl * new quantize path: add NEON impl for ggml_quantize_mat_q8_K_4x8 * new gemv kernel: new ggml_gemv_q4_K_4x8_q8_K NEON kernel for GGML_OP_MUL_MAT_ID/GGML_OP_MUL_MAT * new gemm kernel: new ggml_gemm_q4_K_4x8_q8_K NEON kernel for GGML_OP_MUL_MAT_ID/GGML_OP_MUL_MAT * performance boost for both S_PP and S_TG --------- Co-authored-by: yuanjia111 <[email protected]>
1 parent 77dee9d commit 543e8eb

File tree

5 files changed

+950
-227
lines changed

5 files changed

+950
-227
lines changed

ggml/src/ggml-cpu/arch-fallback.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@
5050
#define ggml_gemm_iq4_nl_8x8_q8_0_generic ggml_gemm_iq4_nl_8x8_q8_0
5151
#elif defined(__aarch64__) || defined(__arm__) || defined(_M_ARM) || defined(_M_ARM64)
5252
// repack.cpp
53-
#define ggml_quantize_mat_q8_K_4x8_generic ggml_quantize_mat_q8_K_4x8
53+
//#define ggml_quantize_mat_q8_K_4x8_generic ggml_quantize_mat_q8_K_4x8
5454
#define ggml_gemv_q4_K_8x8_q8_K_generic ggml_gemv_q4_K_8x8_q8_K
5555
#define ggml_gemv_iq4_nl_8x8_q8_0_generic ggml_gemv_iq4_nl_8x8_q8_0
5656
#define ggml_gemv_q2_K_8x8_q8_K_generic ggml_gemv_q2_K_8x8_q8_K

0 commit comments

Comments
 (0)