Skip to content

Commit e1b10fe

Browse files
committed
Update Vision LLM LP
- Update performance numbers - Remove "Known issues" section since it's fixed upstream
1 parent 1dea376 commit e1b10fe

File tree

5 files changed

+25
-38
lines changed

5 files changed

+25
-38
lines changed

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ The table below gives you an explanation of the different arguments:
7878

7979
To learn more about the parameters, refer to the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
8080

81-
Verify the model is built correct by checking the size of the resulting model. The `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory should be atleast 1 GB in size.
81+
Verify the model is built correct by checking the size of the resulting model. The `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory should be at least 1 GB in size.
8282

8383
Push the model onto the device:
8484

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md

Lines changed: 1 addition & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ A fork of the upstream MNN repository is set up to enable building the app as an
1616
cd $HOME
1717
git clone https://github.com/HenryDen/MNN.git
1818
cd MNN
19-
git checkout origin/MNN_commit
19+
git checkout origin/llm_android_demo
2020
```
2121

2222
## Build the app using Android Studio
@@ -30,33 +30,6 @@ This will trigger a build of the project, and you should see a similar output on
3030
```output
3131
BUILD SUCCESSFUL in 1m 42s
3232
```
33-
#### Known build issues
34-
35-
Depending on your Android Studio environment, you may encounter dependency incompatibility with the MNN project. If the build is not successful, you can walk through the following steps to address two known build issues.
36-
37-
1. Add Gradle namespace
38-
39-
For some Gradle versions, you are required to add a `namespace` to your `build.gradle` file.
40-
41-
42-
![Gradle Build menu](gradle_build.png)
43-
44-
From the Android menu, open the highlighted file in the above image and add the following to the `android` field.
45-
46-
```output
47-
namespace "com.mnn.llm"
48-
```
49-
50-
2. Align dependencies version
51-
52-
You may see an error in dependencies not having aligned version. Open `app/build.gradle` update the `androidTestImplementation` version:
53-
54-
```output
55-
dependencies {
56-
androidTestImplementation 'androidx.test.espresso:espresso-core:3.5.1'
57-
androidTestImplementation 'androidx.test.espresso:espresso-idling-resource:3.5.1'
58-
}
59-
```
6033

6134
### Generate and run the APK
6235

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md

Lines changed: 22 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -77,12 +77,12 @@ The image features a tiger standing in a grassy field, with its front paws raise
7777
#################################
7878
prompt tokens num = 243
7979
decode tokens num = 70
80-
vision time = 5.96 s
80+
vision time = 5.76 s
8181
audio time = 0.00 s
82-
prefill time = 1.80 s
83-
decode time = 2.09 s
84-
prefill speed = 135.29 tok/s
85-
decode speed = 33.53 tok/s
82+
prefill time = 1.26 s
83+
decode time = 2.02 s
84+
prefill speed = 192.28 tok/s
85+
decode speed = 34.73 tok/s
8686
##################################
8787
```
8888

@@ -113,13 +113,27 @@ export LD_LIBRARY_PATH=$PWD
113113
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
114114
```
115115

116+
The same output should be displayed, with the benchmark printed at the end:
117+
```output
118+
#################################
119+
prompt tokens num = 243
120+
decode tokens num = 70
121+
vision time = 2.91 s
122+
audio time = 0.00 s
123+
prefill time = 0.91 s
124+
decode time = 1.56 s
125+
prefill speed = 266.13 tok/s
126+
decode speed = 44.96 tok/s
127+
##################################
128+
```
129+
116130
This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels.
117131

118132
| Benchmark | Without KleidiAI | With KleidiAI |
119133
|---------------------|------------------|---------------|
120-
| Vision Process Time | 5.45s | 5.43 s |
121-
| Prefill Speed | 132.35 tok/s | 148.30 tok/s |
122-
| Decode Speed | 21.61 tok/s | 33.26 tok/s |
134+
| Vision Process Time | 5.76 s | 2.91 s |
135+
| Prefill Speed | 192.28 tok/s | 266.13 tok/s |
136+
| Decode Speed | 34.73 tok/s | 44.96 tok/s |
123137

124138
The prefill speed describes how fast the model processes the input prompt. The decode speed corresponds to the rate at which the model generates new tokens after the input is processed
125139

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ MNN is a high-performance, lightweight deep learning framework designed for both
1212

1313
**MNN-LLM** is a large language model (LLM) runtime solution built on the MNN engine, designed to enable local deployment of LLMs across diverse platforms, including mobile devices, PCs, and IoT systems. It supports leading models such as Qianwen, Baichuan, Zhipu, and Llama, ensuring efficient and accessible AI-powered experiences.
1414

15-
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework, enhancing the inference performance of large language models (LLMs) within MNN. The Android app in this learning path demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B]([https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct)) model.
15+
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework, enhancing the inference performance of large language models (LLMs) within MNN. The Android app in this learning path demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) model.
1616

1717
## Vision Transformer(ViT)
1818
The ViT is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs), which process images using convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP).

0 commit comments

Comments
 (0)