Skip to content

Mirror "Arm AArch64: optimized GEMV and GEMM kernels for q4_0_q8_0, and q8_0_q8_0 quantization" from llama.cpp #2605

@smpurkis

Description

@smpurkis

I'm looking to mirror this change ggml-org/llama.cpp#5780 for Candle.

I have some experience with rust and have a repo that uses the same assembly instructions (see here) as the above PR but need some help/guidance integrating it with Candle.

  1. The license seems to be separate for the GEMV and GEMM specific code in llama.cpp, not sure what the best option here is. In llama.cpp I they kept it in a separate file with its own licence, see here.
  2. The GgmlType trait uses vec_dot and loops over to calculate the matmul. The assembly for these interweaved types directly runs the matmul in assembly (in llama.cpp), so there is no associated vec_dot function to place into the GgmlType trait. Should I modify this trait or create another for these interweave types? What is the best way to handle this?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions