ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

Vithulep · 2025-08-06T09:24:38Z

This PR adds SVE kernel support for the f16 (ggml_vec_dot_f16() Kernel) data type to reduce the time required for image encoding during LMM model (llava-v1.6-mistral) inference on ARM architecture.

Major code changes:

In vec.cpp file:

ggml_vec_dot_f16()

In vec.h file:

ggml_vec_dot_f16_unroll()
ggml_vec_mad_f16()
ggml_vec_scale_f16()

In simd-mappings.h:

Added #define directives for fp16 SVE.

Performance: Graviton3E

On Graviton3E with different threads, got 5-15% speedup on Image Encoding time for multimodal (LMM) inference.

Model: llava-v1.6-mistral-7b.Q4_K_M
Machine: Graviton3E

Threads	Neon Time (OSS) (ms)	SVE Time (This PR) (ms)	Speedup
16	3702	3509.6	1.05
32	1972.2	1756.3	1.12
64	1209.3	1046.9	1.15

Command Used:

 ./llama-mtmd-cli -m llava-v1.6-mistral-7b.Q4_K_M.gguf --mmproj mmproj-model-f16.gguf -t 64

Perplexity

I have ran perplexity with the NEON(Original) and SVE (This PR) Implementation.
And below is the summary.

NEON (Original)	SVE (This PR)
16.0865 +/- 0.16204	16.0915 +/- 0.16211

This correction does not appear to have any impact on accuracy.

ggml/src/ggml-cpu/simd-mappings.h

abhijain1204fujitsu · 2025-08-18T03:36:43Z

@ggerganov , request you to kindly review the PR and support for merger

Vithulep · 2025-08-26T10:35:50Z

@ggerganov, @compilade, please review this PR.

ggerganov

Same comment as in #15057 (review). We don't even have CI hardware to test these changes, so it's difficult to approve these.

Let's merge after you fix the editor config errors.

…rg#15115) * Added sve implementation for vec_dot_fp16 Kernel * removed white spaces * Added comment * removed white spaces * changed GGML_F16x_VEC_FMA for code consistency * Update vec.h --------- Co-authored-by: vithulep <[email protected]>

…ggml-org#15115)" This reverts commit a0c2b20.

Added sve implementation for vec_dot_fp16 Kernel

a57fc34

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Aug 6, 2025

pvname added 3 commits August 6, 2025 15:42

removed white spaces

f89d04e

Added comment

543d539

removed white spaces

15e0c79

compilade reviewed Aug 6, 2025

View reviewed changes

ggml/src/ggml-cpu/simd-mappings.h Outdated Show resolved Hide resolved

changed GGML_F16x_VEC_FMA for code consistency

0aca430

Merge branch 'master' into Vec_dot_fp16_Sve_implementation

5561b0a

ggerganov approved these changes Aug 28, 2025

View reviewed changes

Update vec.h

0e310f9

CISC merged commit a0c2b20 into ggml-org:master Sep 1, 2025
48 checks passed

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 7, 2025

Revert "ggml: aarch64: Implement SVE F16 kernels for vector functions (…

1fe10a2

…ggml-org#15115)" This reverts commit a0c2b20.

Nexesenex added a commit to Nexesenex/croco.cpp that referenced this pull request Oct 26, 2025

Revert "ggml: aarch64: Implement SVE F16 kernels for vector functions (…

a1fbd87

…ggml-org#15115)" This reverts commit a0c2b20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

Uh oh!

Vithulep commented Aug 6, 2025

Uh oh!

Uh oh!

abhijain1204fujitsu commented Aug 18, 2025

Uh oh!

Vithulep commented Aug 26, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

ggml: aarch64: Implement SVE F16 kernels for vector functions #15115

Uh oh!

Conversation

Vithulep commented Aug 6, 2025

Performance: Graviton3E

Perplexity

Uh oh!

Uh oh!

abhijain1204fujitsu commented Aug 18, 2025

Uh oh!

Vithulep commented Aug 26, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants