Commit dbb852b
ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) (ggml-org#16739)
* Enabled q4_K_8x8_q8_K path on ARM
* wip: I8mm qs multiplication, pending bias
* cpu : arm : REPACK gemm q4_K8x8 implementation
Signed-off-by: Alberto Cabrera <[email protected]>
* Guard gemm with proper features, improved superblock scale and min calc
Signed-off-by: Alberto Cabrera <[email protected]>
* cpu: arm: Implemented REPACK gemv for Q4_K
Signed-off-by: Alberto Cabrera <[email protected]>
* Removed completed TODO
* Fixed missing guards when selecting optimal repack type for Q4_K
Signed-off-by: Alberto Cabrera <[email protected]>
* Fixed macro guard for gemv
* Fixed wrong comment in GEMV
* Fixed warning for unused variable
* vdotq_s32 -> ggml_vdotq_s32
Signed-off-by: Alberto Cabrera <[email protected]>
* Clang-format issues
* Apply suggestions from code review
Co-authored-by: Diego Devesa <[email protected]>
* Removed unnecessary GGML_UNUSED
* Fixed guards in q4_k gemm and gemv (repack)
---------
Signed-off-by: Alberto Cabrera <[email protected]>
Co-authored-by: Diego Devesa <[email protected]>1 parent 5f55c38 commit dbb852b
File tree
3 files changed
+393
-2
lines changed- ggml/src/ggml-cpu
- arch/arm
3 files changed
+393
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
55 | 54 | | |
56 | 55 | | |
57 | | - | |
58 | 56 | | |
59 | 57 | | |
60 | 58 | | |
| |||
0 commit comments