Skip to content

Commit 17998ba

Browse files
authored
Merge pull request #1732 from annietllnd/vision-llm
Technical review of Vision LLM LP
2 parents 706d1ac + 4f2ef39 commit 17998ba

File tree

9 files changed

+311
-201
lines changed

9 files changed

+311
-201
lines changed

assets/contributors.csv

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Alaaeddine Chakroun,Day Devs,Alaaeddine-Chakroun,alaaeddine-chakroun,,https://da
4747
Koki Mitsunami,Arm,,kmitsunami,,
4848
Chen Zhang,Zilliz,,,,
4949
Tianyu Li,Arm,,,,
50-
Georgios Mermigkis,VectorCamp,gMerm,georgios-mermigkis,,https://vectorcamp.gr/
50+
Georgios Mermigkis,VectorCamp,gMerm,georgios-mermigkis,,https://vectorcamp.gr/
5151
Ben Clark,Arm,,,,
5252
Han Yin,Arm,hanyin-arm,nacosiren,,
5353
Willen Yang,Arm,,,,
@@ -80,3 +80,5 @@ Tom Pilar,,,,,
8080
Cyril Rohr,,,,,
8181
Odin Shen,Arm,odincodeshen,odin-shen-lmshen,,
8282
Avin Zarlez,Arm,AvinZarlez,avinzarlez,,https://www.avinzarlez.com/
83+
Shuheng Deng,Arm,,,,
84+
Yiyang Fan,Arm,,,,
Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
---
2+
title: Build the MNN Android Demo with GUI
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this section, you will set up a development environment by installing dependencies and preparing the Qwen vision model.
10+
11+
## Install required software
12+
13+
Install the Android NDK (Native Development Kit) and git-lfs. This learning path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`.
14+
15+
For Ubuntu or Debian systems, you can install CMake and git-lfs with the following command:
16+
17+
```bash
18+
sudo apt update
19+
sudo apt install cmake git-lfs -y
20+
```
21+
22+
You can use Android Studio to obtain the NDK. Click **Tools > SDK Manager**, and navigate to the the SDK Tools tab. Select the NDK (Side by side) and CMake checkboxes, as shown below:
23+
24+
![Install NDK](./install_ndk.png)
25+
26+
Refer to [Install NDK and CMake](https://developer.android.com/studio/projects/install-ndk) for other installation methods.
27+
28+
Make sure Python and pip is installed by verifying a version is printed on running this command:
29+
30+
```bash
31+
python --version
32+
pip --version
33+
```
34+
35+
{{% notice Note %}}
36+
The above commands may fail when Python is installed if Python 3.x is not the default version. You can try running `python3 --version` and `pip3 --version` to be sure.
37+
{{% /notice %}}
38+
39+
## Set up phone connection
40+
41+
You will need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included in Android Studio, comes with Android Debug Bridge (ADB). You will use this tool to transfer files later on.
42+
43+
Connect your phone to the computer using a USB cable. You will need to activate USB debugging on your phone. Find the **Build Number** in your **Settings** app and tap it 7 times. Then, enable **USB debugging** in **Developer Options**.
44+
45+
You should now see your device listed upon running `adb devices`:
46+
47+
```output
48+
List of devices attached
49+
<DEVICE ID> device
50+
```
51+
52+
## Download and convert the model
53+
54+
The following commands download the model from Hugging Face, and clones a tool for exporting LLM model to the MNN framework.
55+
56+
```bash
57+
cd $HOME
58+
pip install -U huggingface_hub
59+
huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Instruct/
60+
git clone https://github.com/wangzhaode/llm-export
61+
cd llm-export && pip install .
62+
```
63+
64+
You can use the `llm-export` repository to quantize the model with the following options:
65+
66+
```bash
67+
llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \
68+
--quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym
69+
```
70+
71+
The table below gives you an explanation of the different arguments:
72+
73+
| Parameter | Description | Explanation |
74+
|------------------|-------------|--------------|
75+
| `--quant_bit` | mnn quant bit, 4 or 8, default is 4 | `4` represents q4 quantization. |
76+
| `--quant_block` | mnn quant block, default is 0 | `0` represents per-channel quantization, `128` represents 128 per-block quantization. |
77+
| `--sym` | symmetric quantization (without zeropoint), defualt is False. | The quantization parameter that enables symmetrical quantization. |
78+
79+
To learn more about the parameters, refer to the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
80+
81+
Verify the model is built correct by checking the size of the resulting model. The `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory should be at least 1 GB in size.
82+
83+
Push the model onto the device:
84+
85+
```shell
86+
adb shell mkdir /data/local/tmp/models/
87+
adb push Qwen2-VL-2B-Instruct-convert-4bit-per_channel /data/local/tmp/models
88+
```
89+
90+
With the model set up, it's time to use Android Studio to build and run an example application.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
title: Benchmark the Vision Transformer performance with KleidiAI
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this section, you will try the Qwen model in action using a demo application using a Android Package Kit (APK)
10+
11+
## Clone MNN repo
12+
13+
A fork of the upstream MNN repository is set up to enable building the app as an Android Studio project. Run the following to clone the repository and checkout the source tree:
14+
15+
```bash
16+
cd $HOME
17+
git clone https://github.com/HenryDen/MNN.git
18+
cd MNN
19+
git checkout origin/llm_android_demo
20+
```
21+
22+
## Build the app using Android Studio
23+
24+
### Open project and build
25+
26+
Open Android Studio. Go to **File > Open**. Navigate to the MNN repository you just cloned. Expand the `transformers/llm/engine/` directories, select the `android` one and click `Open`.
27+
28+
This will trigger a build of the project, and you should see a similar output on completion:
29+
30+
```output
31+
BUILD SUCCESSFUL in 1m 42s
32+
```
33+
34+
### Generate and run the APK
35+
36+
Navigate to **Build > Generate App Bundles or APKs**. Select **Generate APKs**.
37+
38+
The build will be executed, and then the app will be copied and installed on the Android device.
39+
40+
After opening the app, you will see the splash screen:
41+
42+
![Loading screenshot](Loading_page.png)
43+
44+
Finally, you can use the UI to chat with the app. Try uploading an image and ask a question on it.
45+
46+
![Loading screenshot](chat2.png)
47+
48+
The final step is to examine how KleidiAI can improve the performance of the model. Continue to the next section to find out.
Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
---
2+
title: Build the MNN Command-line ViT Demo
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this section, you will use the model to benchmark performance with and without KleidiAI kernels. You will need to compile library files to run the optimized inference.
10+
11+
## Prepare an example image
12+
13+
You will use an image to run a command-line prompt. In this learning path, the tiger below will be used as an example. You can save this image or provide one of your own. Re-name the image to `example.png` in order to use the commands in the following sections.
14+
15+
![example image](example.png)
16+
17+
Use ADB to load the image onto your phone:
18+
19+
```bash
20+
adb push example.png /data/local/tmp/
21+
```
22+
23+
## Build binaries for command-line inference
24+
25+
Navigate to the MNN project you cloned in the previous section. Create a build directory and run the script. The first time, you will build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`.
26+
27+
```bash
28+
cd $HOME/MNN/project/android
29+
mkdir build_64 && cd build_64
30+
31+
../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=FALSE \
32+
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
33+
-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
34+
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
35+
```
36+
{{% notice Note %}}
37+
If your NDK toolchain isn't set up correctly, you may run into issues with the above script. Make note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`.
38+
39+
```bash
40+
export ANDROID_NDK_HOME=<path-to>/ndk/28.0.12916984
41+
42+
export CMAKE_TOOLCHAIN_FILE=$ANDROID_NDK_HOME/build/cmake/android.toolchain.cmake
43+
export ANDROID_NDK=$ANDROID_NDK_HOME
44+
```
45+
{{% /notice %}}
46+
47+
Push the files to your mobile device. Then, enter a shell on the phone using ADB.
48+
49+
```bash
50+
adb push *so llm_demo tools/cv/*so /data/local/tmp/
51+
adb shell
52+
```
53+
54+
The following commands should be run in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt.
55+
56+
```bash
57+
cd /data/local/tmp/
58+
chmod +x llm_demo
59+
export LD_LIBRARY_PATH=$PWD
60+
echo "<img>./example.png</img>Describe the content of the image." > prompt
61+
```
62+
63+
Finally, run an inference on the model with the following command.
64+
65+
```bash
66+
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
67+
```
68+
69+
If the launch is successful, you should see the following output, with the performance benchmark at the end.
70+
71+
```output
72+
config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json
73+
tokenizer_type = 3
74+
prompt file is prompt
75+
The image features a tiger standing in a grassy field, with its front paws raised and its eyes fixed on something or someone behind it. The tiger's stripes are clearly visible against the golden-brown background of the grass. The tiger appears to be alert and ready for action, possibly indicating a moment of tension or anticipation in the scene.
76+
77+
#################################
78+
prompt tokens num = 243
79+
decode tokens num = 70
80+
vision time = 5.76 s
81+
audio time = 0.00 s
82+
prefill time = 1.26 s
83+
decode time = 2.02 s
84+
prefill speed = 192.28 tok/s
85+
decode speed = 34.73 tok/s
86+
##################################
87+
```
88+
89+
## Enable KleidiAI and re-run inference
90+
91+
The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. From the `build_64` directory, run:
92+
```bash
93+
../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \
94+
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
95+
-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
96+
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
97+
```
98+
99+
The next step is to update the files on your phone. Start by removing the ones used in the previous step. Then, push the new ones with the same command as before.
100+
101+
```bash
102+
adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so"
103+
adb push *so llm_demo tools/cv/*so /data/local/tmp/
104+
adb shell
105+
```
106+
107+
In the new ADB shell, preform the same steps as in the previous section.
108+
109+
```bash
110+
cd /data/local/tmp/
111+
chmod +x llm_demo
112+
export LD_LIBRARY_PATH=$PWD
113+
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
114+
```
115+
116+
The same output should be displayed, with the benchmark printed at the end:
117+
```output
118+
#################################
119+
prompt tokens num = 243
120+
decode tokens num = 70
121+
vision time = 2.91 s
122+
audio time = 0.00 s
123+
prefill time = 0.91 s
124+
decode time = 1.56 s
125+
prefill speed = 266.13 tok/s
126+
decode speed = 44.96 tok/s
127+
##################################
128+
```
129+
130+
This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels.
131+
132+
| Benchmark | Without KleidiAI | With KleidiAI |
133+
|---------------------|------------------|---------------|
134+
| Vision Process Time | 5.76 s | 2.91 s |
135+
| Prefill Speed | 192.28 tok/s | 266.13 tok/s |
136+
| Decode Speed | 34.73 tok/s | 44.96 tok/s |
137+
138+
The prefill speed describes how fast the model processes the input prompt. The decode speed corresponds to the rate at which the model generates new tokens after the input is processed
139+
140+
This shows the advantages of using Arm optimized kernels for your ViT use-cases.

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Benchmark_the_performance.md

Lines changed: 0 additions & 86 deletions
This file was deleted.

0 commit comments

Comments
 (0)