Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

tc-mb · 2025-08-12T08:39:05Z

As stated in #14983, I have integrated Apple NPU (ANE) acceleration into llama.cpp.

Using MiniCPM-V 4.0 as an example, I will introduce a simple way to use ANE and hope we can discuss a better approach.

Build llama.cpp locally，I added an ENABLE_ANE option to control whether ANE is used.

cmake -B build -DENABLE_ANE=ON
cmake --build build --config Release -j 8

Download ane in Hugging Face or Modelscope, If you downloaded the zip file, please unzip it.
Used like mmproj, I added the "--ane" interface. The path is the downloaded ane_minicpmv4_vit_f16.mlmodelc file address.

./build/bin/llama-mtmd-cli -m {dir_path}/ggml-model-Q4_0.gguf --mmproj {dir_path}/mmproj-model-f16.gguf --ane {dir_path}/ane_minicpmv4_vit_f16.mlmodelc -c 4096 --temp 0.7 --top-p 0.8 --top-k 100 --repeat-penalty 1.05 --image {dir_path}/xx.png -p "Describe the content of the image in detail."

I tested ANE acceleration on several devices. The benchmark results are as follows:

mac M2		image size		MiniCPM-V 4.0(ANE)	MiniCPM-V 4.0
q4_K_M	1	448×448	prefill time(ms)	790.26	5716.77
	2	600×600	prefill time(ms)	1894.24	17961.35
	3	700×700	prefill time(ms)	2954.34	27866.59
	4	800×800	prefill time(ms)	2964.44	27946.48
	5	1024×625	prefill time(ms)	2977.56	30111.43
	6	1024×768	prefill time(ms)	2975.98	30415.11
	7	1280×960	prefill time(ms)	4065.79	41889.12
mac M4				MiniCPM-V 4.0(ane)	MiniCPM-V 4.0
q4_K_M	1	448×448	prefill time(ms)	412.57	736.57
	2	600×600	prefill time(ms)	989.44	3365.09
	3	700×700	prefill time(ms)	1564.61	4031.90
	4	800×800	prefill time(ms)	1555.85	4124.81
	5	1024×625	prefill time(ms)	1563.65	5405.13
	6	1024×768	prefill time(ms)	1567.45	5169.05
	7	1280×960	prefill time(ms)	2141.54	7544.96

A point worth noting: The first time ANE is used, there is a loading time and it will be slightly slower. After that, as long as ANE is not updated, it will remain ready and waiting in the system.

Feat ios

feat ios: add clean kv cache

ggerganov

Generally looks OK. Need to improve encapsulation of the CoreML code (see comments). Would need a review from @ngxson.

Also:

Use "CoreML" instead of "ANE"
Would eventually need instructions for generating the CoreML inference code - can add those after the PR is approved

ggerganov · 2025-08-12T10:31:15Z

tools/mtmd/clip.h

+bool ane_embedding(struct clip_ctx * ctx, int n_threads, const struct clip_image_f32_batch * imgs, float * vec);
+bool ane_resampler(struct clip_ctx * ctx, int n_threads, const struct clip_image_f32_batch * imgs, const float * vit_embedding, float * vec);
+


No need to expose this in the public interface

ggerganov · 2025-08-12T10:33:53Z

tools/mtmd/clip.h

+
+// ANE support functions
+void clip_set_ane_model_path(struct clip_ctx * ctx, const char * ane_model_path);


We should find a way to avoid this. Maybe we can do something similar to whisper.cpp:

https://github.com/ggml-org/whisper.cpp/blob/f7502dca872866a310fe69d30b163fa87d256319/src/whisper.cpp#L3351-L3373

ggerganov · 2025-08-12T10:34:52Z

tools/mtmd/mtmd.h

@@ -82,6 +82,7 @@ struct mtmd_context_params {
    enum ggml_log_level verbosity;
    const char * image_marker; // deprecated, use media_marker instead
    const char * media_marker;
+    const char * ane_model_path; // path to ANE model for iOS


Instead of the term "ane", use the term "coreml" as it is more correct. CoreML models can run not only the Apple Neural Engine, but also on the GPU and CPU.

ggerganov · 2025-08-12T10:37:16Z

tools/mtmd/clip.cpp

+
+    static int flag = 0;
+    static const void* coremlEncoder = NULL;
+    static std::string cached_model_path = "";
+
+    // Check if we need to load a new model
+    if (flag == 0 || (ane_model_path && cached_model_path != ane_model_path)) {
+        if (coremlEncoder) {


Avoid this global state. Figure out a way to move this to the clip context.

tc-mb and others added 17 commits July 7, 2025 14:58

support minicpm-v 4

f37f8a9

ane test

220ad75

test code for ios

8898862

support app s1

999a87d

feat: mtmd support xcframework

4d32fd2

fix for app

8775bd5

temp no use slice

2e7bcd3

fix no use slice

d4f0cfe

Merge remote-tracking branch 'upstream/tmp_project_i' into feat_ios

13efcf4

Merge pull request #38 from tc-mb/feat_ios

93f4086

Feat ios

feat: add clean kv cache

ea69103

Merge pull request #39 from tc-mb/feat_ios_0721

e468802

feat ios: add clean kv cache

rename ane

2fd1ef7

update comments

864d013

optimized interface

54258e9

merge ane first, temp rm app support

629b625

add file existence check

701aff9

github-actions bot added examples python python script changes labels Aug 12, 2025

Merge branch 'master' into Integrate-ANE-support-into-llama.cpp

0042f12

ggerganov reviewed Aug 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

tc-mb commented Aug 12, 2025

Uh oh!

ggerganov left a comment

Uh oh!

ggerganov Aug 12, 2025

Uh oh!

ggerganov Aug 12, 2025

Uh oh!

ggerganov Aug 12, 2025

Uh oh!

ggerganov Aug 12, 2025

Uh oh!

Uh oh!

		bool ane_embedding(struct clip_ctx * ctx, int n_threads, const struct clip_image_f32_batch * imgs, float * vec);
		bool ane_resampler(struct clip_ctx * ctx, int n_threads, const struct clip_image_f32_batch * imgs, const float * vit_embedding, float * vec);


		// ANE support functions
		void clip_set_ane_model_path(struct clip_ctx * ctx, const char * ane_model_path);

Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

Are you sure you want to change the base?

Apple NPU acceleration integrated into llama.cpp, using MiniCPM-V 4.0 as an example. #15262

Conversation

tc-mb commented Aug 12, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!