Skip to content

Conversation

Vithulep
Copy link
Contributor

@Vithulep Vithulep commented Aug 6, 2025

This PR adds SVE kernel support for the f16 (ggml_vec_dot_f16() Kernel) data type to reduce the time required for image encoding during LMM model (llava-v1.6-mistral) inference on ARM architecture.

Major code changes:

In vec.cpp file:

  1. ggml_vec_dot_f16()

In vec.h file:

  1. ggml_vec_dot_f16_unroll()
  2. ggml_vec_mad_f16()
  3. ggml_vec_scale_f16()

In simd-mappings.h:

  1. Added #define directives for fp16 SVE.

Performance: Graviton3E

On Graviton3E with different threads, got 5-15% speedup on Image Encoding time for multimodal (LMM) inference.

Model: llava-v1.6-mistral-7b.Q4_K_M
Machine: Graviton3E

Threads Neon Time (OSS) (ms) SVE Time (This PR) (ms) Speedup
16 3702 3509.6 1.05
32 1972.2 1756.3 1.12
64 1209.3 1046.9 1.15

Command Used:

 ./llama-mtmd-cli -m llava-v1.6-mistral-7b.Q4_K_M.gguf --mmproj mmproj-model-f16.gguf -t 64

Perplexity

I have ran perplexity with the NEON(Original) and SVE (This PR) Implementation.
And below is the summary.

NEON (Original) SVE (This PR)
16.0865 +/- 0.16204 16.0915 +/- 0.16211

This correction does not appear to have any impact on accuracy.

@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 6, 2025
@abhijain1204fujitsu
Copy link

@ggerganov , request you to kindly review the PR and support for merger

@Vithulep
Copy link
Contributor Author

@ggerganov, @compilade, please review this PR.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as in #15057 (review). We don't even have CI hardware to test these changes, so it's difficult to approve these.

Let's merge after you fix the editor config errors.

@CISC CISC merged commit a0c2b20 into ggml-org:master Sep 1, 2025
48 checks passed
walidbr pushed a commit to walidbr/llama.cpp that referenced this pull request Sep 7, 2025
…rg#15115)

* Added sve implementation for vec_dot_fp16 Kernel

* removed white spaces

* Added comment

* removed white spaces

* changed GGML_F16x_VEC_FMA for code consistency

* Update vec.h

---------

Co-authored-by: vithulep <[email protected]>
Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants