Skip to content

Commit dbb852b

Browse files
Alcpzslaren
andauthored
ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (ggml-org#16739)
* Enabled q4_K_8x8_q8_K path on ARM * wip: I8mm qs multiplication, pending bias * cpu : arm : REPACK gemm q4_K8x8 implementation Signed-off-by: Alberto Cabrera <[email protected]> * Guard gemm with proper features, improved superblock scale and min calc Signed-off-by: Alberto Cabrera <[email protected]> * cpu: arm: Implemented REPACK gemv for Q4_K Signed-off-by: Alberto Cabrera <[email protected]> * Removed completed TODO * Fixed missing guards when selecting optimal repack type for Q4_K Signed-off-by: Alberto Cabrera <[email protected]> * Fixed macro guard for gemv * Fixed wrong comment in GEMV * Fixed warning for unused variable * vdotq_s32 -> ggml_vdotq_s32 Signed-off-by: Alberto Cabrera <[email protected]> * Clang-format issues * Apply suggestions from code review Co-authored-by: Diego Devesa <[email protected]> * Removed unnecessary GGML_UNUSED * Fixed guards in q4_k gemm and gemv (repack) --------- Signed-off-by: Alberto Cabrera <[email protected]> Co-authored-by: Diego Devesa <[email protected]>
1 parent 5f55c38 commit dbb852b

File tree

3 files changed

+393
-2
lines changed

3 files changed

+393
-2
lines changed

ggml/src/ggml-cpu/arch-fallback.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,8 @@
5151
#elif defined(__aarch64__) || defined(__arm__) || defined(_M_ARM) || defined(_M_ARM64)
5252
// repack.cpp
5353
#define ggml_quantize_mat_q8_K_4x8_generic ggml_quantize_mat_q8_K_4x8
54-
#define ggml_gemv_q4_K_8x8_q8_K_generic ggml_gemv_q4_K_8x8_q8_K
5554
#define ggml_gemv_iq4_nl_8x8_q8_0_generic ggml_gemv_iq4_nl_8x8_q8_0
5655
#define ggml_gemv_q2_K_8x8_q8_K_generic ggml_gemv_q2_K_8x8_q8_K
57-
#define ggml_gemm_q4_K_8x8_q8_K_generic ggml_gemm_q4_K_8x8_q8_K
5856
#define ggml_gemm_iq4_nl_8x8_q8_0_generic ggml_gemm_iq4_nl_8x8_q8_0
5957
#define ggml_gemm_q2_K_8x8_q8_K_generic ggml_gemm_q2_K_8x8_q8_K
6058
#elif defined(__x86_64__) || defined(__i386__) || defined(_M_IX86) || defined(_M_X64)

0 commit comments

Comments
 (0)