Mirror "Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization" from llama.cpp

I'm looking to mirror this change https://github.com/ggerganov/llama.cpp/pull/5780 for Candle. 

I have some experience with rust and have a repo that uses the same assembly instructions (see [here](https://github.com/smpurkis/asm_rs_bench)) as the above PR but need some help/guidance integrating it with Candle.

1. The license seems to be separate for the GEMV and GEMM specific code in llama.cpp, not sure what the best option here is. In llama.cpp I they kept it in a separate file with its own licence, see [here](https://github.com/ggerganov/llama.cpp/blob/b11f9ba9b8ce319f04b88afe40d264e6b7f4ba46/ggml/src/ggml-aarch64.c#L1C1-L2C1).
2. The [GgmlType](https://github.com/huggingface/candle/blob/main/candle-core/src/quantized/k_quants.rs#L22) trait uses [vec_dot](https://github.com/huggingface/candle/blob/main/candle-core/src/quantized/k_quants.rs#L36) and loops over to calculate the [matmul](https://github.com/huggingface/candle/blob/main/candle-core/src/quantized/k_quants.rs#L1876). The assembly for these interweaved types directly runs the [matmul in assembly](https://github.com/smpurkis/asm_rs_bench/blob/main/src/gemm.rs#L140) (in [llama.cpp](https://github.com/ggerganov/llama.cpp/blob/b11f9ba9b8ce319f04b88afe40d264e6b7f4ba46/ggml/src/ggml-aarch64.c#L1117)), so there is no associated `vec_dot` function to place into the `GgmlType` trait. Should I modify this trait or create another for these interweave types? What is the best way to handle this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mirror "Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization" from llama.cpp #2605

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mirror "Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization" from llama.cpp #2605

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions