Skip to content

Commit 9863ead

Browse files
Merge pull request #1747 from madeline-underwood/Vision-LLM
Vision LLM_Andy to check
2 parents 4a63869 + 216ad34 commit 9863ead

File tree

5 files changed

+90
-66
lines changed

5 files changed

+90
-66
lines changed

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/1-devenv-and-model.md

Lines changed: 26 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,53 +5,57 @@ weight: 3
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
## Install Required Software
89

9-
In this section, you will set up a development environment by installing dependencies and preparing the Qwen vision model.
10+
In this section, you'll set up your development environment by installing dependencies and preparing the Qwen vision model.
1011

11-
## Install required software
12+
Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`.
1213

13-
Install the Android NDK (Native Development Kit) and git-lfs. This learning path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`.
14-
15-
For Ubuntu or Debian systems, you can install CMake and git-lfs with the following command:
14+
For Ubuntu or Debian systems, install CMake and git-lfs with the following commands:
1615

1716
```bash
1817
sudo apt update
1918
sudo apt install cmake git-lfs -y
2019
```
2120

22-
You can use Android Studio to obtain the NDK. Click **Tools > SDK Manager**, and navigate to the the SDK Tools tab. Select the NDK (Side by side) and CMake checkboxes, as shown below:
21+
Alternatively, you can use Android Studio to obtain the NDK.
22+
23+
Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab.
24+
25+
Select the **NDK (Side by side)** and **CMake** checkboxes, as shown below:
2326

2427
![Install NDK](./install_ndk.png)
2528

26-
Refer to [Install NDK and CMake](https://developer.android.com/studio/projects/install-ndk) for other installation methods.
29+
See [Install NDK and CMake](https://developer.android.com/studio/projects/install-ndk) for other installation methods.
2730

28-
Make sure Python and pip is installed by verifying a version is printed on running this command:
31+
Ensure that Python and pip are installed by verifying the version with these commands:
2932

3033
```bash
3134
python --version
3235
pip --version
3336
```
3437

3538
{{% notice Note %}}
36-
The above commands may fail when Python is installed if Python 3.x is not the default version. You can try running `python3 --version` and `pip3 --version` to be sure.
39+
If Python 3.x is not the default version, try running `python3 --version` and `pip3 --version`.
3740
{{% /notice %}}
3841

39-
## Set up phone connection
42+
## Set up Phone Connection
4043

41-
You will need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included in Android Studio, comes with Android Debug Bridge (ADB). You will use this tool to transfer files later on.
44+
You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files.
4245

43-
Connect your phone to the computer using a USB cable. You will need to activate USB debugging on your phone. Find the **Build Number** in your **Settings** app and tap it 7 times. Then, enable **USB debugging** in **Developer Options**.
46+
Connect your phone to your computer using a USB cable, and enable USB debugging on your phone. To do this, tap the **Build Number** in your **Settings** app 7 times, then enable **USB debugging** in **Developer Options**.
4447

45-
You should now see your device listed upon running `adb devices`:
48+
Verify the connection by running `adb devices`:
4649

4750
```output
4851
List of devices attached
4952
<DEVICE ID> device
5053
```
54+
You should see your device listed.
5155

52-
## Download and convert the model
56+
## Download and Convert the Model
5357

54-
The following commands download the model from Hugging Face, and clones a tool for exporting LLM model to the MNN framework.
58+
The following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework.
5559

5660
```bash
5761
cd $HOME
@@ -60,8 +64,7 @@ huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Ins
6064
git clone https://github.com/wangzhaode/llm-export
6165
cd llm-export && pip install .
6266
```
63-
64-
You can use the `llm-export` repository to quantize the model with the following options:
67+
Use the `llm-export` repository to quantize the model with these options:
6568

6669
```bash
6770
llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \
@@ -72,13 +75,13 @@ The table below gives you an explanation of the different arguments:
7275

7376
| Parameter | Description | Explanation |
7477
|------------------|-------------|--------------|
75-
| `--quant_bit` | mnn quant bit, 4 or 8, default is 4 | `4` represents q4 quantization. |
76-
| `--quant_block` | mnn quant block, default is 0 | `0` represents per-channel quantization, `128` represents 128 per-block quantization. |
77-
| `--sym` | symmetric quantization (without zeropoint), defualt is False. | The quantization parameter that enables symmetrical quantization. |
78+
| `--quant_bit` | MNN quant bit, 4 or 8, default is 4. | `4` represents q4 quantization. |
79+
| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `128` represents 128 per-block quantization. |
80+
| `--sym` | Symmetric quantization (without zeropoint); default is False. | The quantization parameter that enables symmetrical quantization. |
7881

79-
To learn more about the parameters, refer to the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
82+
To learn more about the parameters, see the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
8083

81-
Verify the model is built correct by checking the size of the resulting model. The `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory should be at least 1 GB in size.
84+
Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size.
8285

8386
Push the model onto the device:
8487

@@ -87,4 +90,4 @@ adb shell mkdir /data/local/tmp/models/
8790
adb push Qwen2-VL-2B-Instruct-convert-4bit-per_channel /data/local/tmp/models
8891
```
8992

90-
With the model set up, it's time to use Android Studio to build and run an example application.
93+
With the model set up, you're ready to use Android Studio to build and run an example application.

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/2-generate-apk.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
In this section, you will try the Qwen model in action using a demo application using a Android Package Kit (APK)
10-
119
## Clone MNN repo
1210

13-
A fork of the upstream MNN repository is set up to enable building the app as an Android Studio project. Run the following to clone the repository and checkout the source tree:
11+
In this section, you will run the Qwen model in action using a demo application using a Android Package Kit (APK).
12+
13+
A fork of the upstream MNN repository is set up to enable building the app as an Android Studio project.
14+
15+
Run the following commands to clone the repository and checkout the source tree:
1416

1517
```bash
1618
cd $HOME
@@ -19,19 +21,23 @@ cd MNN
1921
git checkout origin/llm_android_demo
2022
```
2123

22-
## Build the app using Android Studio
24+
## Build the App Using Android Studio
2325

2426
### Open project and build
2527

26-
Open Android Studio. Go to **File > Open**. Navigate to the MNN repository you just cloned. Expand the `transformers/llm/engine/` directories, select the `android` one and click `Open`.
28+
Open Android Studio.
29+
30+
Go to **File > Open**.
31+
32+
Navigate to the cloned MNN repository, expand the `transformers/llm/engine/` directories, select the `android` directory, and click `Open`.
2733

28-
This will trigger a build of the project, and you should see a similar output on completion:
34+
This triggers a build of the project, and you should see output similar to the following on completion:
2935

3036
```output
3137
BUILD SUCCESSFUL in 1m 42s
3238
```
3339

34-
### Generate and run the APK
40+
### Generate and Run the APK
3541

3642
Navigate to **Build > Generate App Bundles or APKs**. Select **Generate APKs**.
3743

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/3-benchmark.md

Lines changed: 32 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@ weight: 5
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8+
## Prepare an Example Image
89

9-
In this section, you will use the model to benchmark performance with and without KleidiAI kernels. You will need to compile library files to run the optimized inference.
10+
In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts.
1011

11-
## Prepare an example image
12+
You can use the provided image of the tiger below that this Learning Path uses, or choose your own.
1213

13-
You will use an image to run a command-line prompt. In this learning path, the tiger below will be used as an example. You can save this image or provide one of your own. Re-name the image to `example.png` in order to use the commands in the following sections.
14+
Whichever you select, rename the image to `example.png` to use the commands in the following sections.
1415

1516
![example image](example.png)
1617

@@ -20,9 +21,13 @@ Use ADB to load the image onto your phone:
2021
adb push example.png /data/local/tmp/
2122
```
2223

23-
## Build binaries for command-line inference
24+
## Build Binaries for Command-line Inference
2425

25-
Navigate to the MNN project you cloned in the previous section. Create a build directory and run the script. The first time, you will build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`.
26+
Navigate to the MNN project that you cloned in the previous section.
27+
28+
Create a build directory and run the build script.
29+
30+
The first time that you do this, build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`.
2631

2732
```bash
2833
cd $HOME/MNN/project/android
@@ -34,7 +39,7 @@ mkdir build_64 && cd build_64
3439
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
3540
```
3641
{{% notice Note %}}
37-
If your NDK toolchain isn't set up correctly, you may run into issues with the above script. Make note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`.
42+
If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`:
3843

3944
```bash
4045
export ANDROID_NDK_HOME=<path-to>/ndk/28.0.12916984
@@ -44,14 +49,16 @@ export ANDROID_NDK=$ANDROID_NDK_HOME
4449
```
4550
{{% /notice %}}
4651

47-
Push the files to your mobile device. Then, enter a shell on the phone using ADB.
52+
## Push Files and Run Inference via ADB
53+
54+
Push the required files to your Android device, then enter a shell on the device using ADB:
4855

4956
```bash
5057
adb push *so llm_demo tools/cv/*so /data/local/tmp/
5158
adb shell
5259
```
5360

54-
The following commands should be run in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt.
61+
Run the following commands in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt.
5562

5663
```bash
5764
cd /data/local/tmp/
@@ -60,13 +67,13 @@ export LD_LIBRARY_PATH=$PWD
6067
echo "<img>./example.png</img>Describe the content of the image." > prompt
6168
```
6269

63-
Finally, run an inference on the model with the following command.
70+
Finally, run an inference on the model with the following command:
6471

6572
```bash
6673
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
6774
```
6875

69-
If the launch is successful, you should see the following output, with the performance benchmark at the end.
76+
If the launch is successful, you should see the following output, with the performance benchmark at the end:
7077

7178
```output
7279
config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json
@@ -86,34 +93,39 @@ prefill speed = 192.28 tok/s
8693
##################################
8794
```
8895

89-
## Enable KleidiAI and re-run inference
96+
## Enable KleidiAI and Re-run Inference
9097

91-
The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. From the `build_64` directory, run:
98+
The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`.
99+
100+
From the `build_64` directory, run:
92101
```bash
93102
../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \
94103
-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
95104
-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
96105
-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
97106
```
107+
## Update Files on the Device
98108

99-
The next step is to update the files on your phone. Start by removing the ones used in the previous step. Then, push the new ones with the same command as before.
109+
First, remove existing binaries from your Android device, then push the updated files:
100110

101111
```bash
102112
adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so"
103113
adb push *so llm_demo tools/cv/*so /data/local/tmp/
104114
adb shell
105115
```
106116

107-
In the new ADB shell, preform the same steps as in the previous section.
117+
With the new ADB shell, run the following commands:
108118

109119
```bash
110120
cd /data/local/tmp/
111121
chmod +x llm_demo
112122
export LD_LIBRARY_PATH=$PWD
113123
./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
114124
```
125+
## Benchmark Results
126+
127+
After running with KleidiAI enabled, you should see improved benchmarks. Example results:
115128

116-
The same output should be displayed, with the benchmark printed at the end:
117129
```output
118130
#################################
119131
prompt tokens num = 243
@@ -127,14 +139,16 @@ prefill speed = 266.13 tok/s
127139
##################################
128140
```
129141

130-
This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels.
142+
This time, you should see an improvement in the benchmark. Below is an example table showing the uplift on three relevant metrics after enabling the KleidiAI kernels:
131143

132144
| Benchmark | Without KleidiAI | With KleidiAI |
133145
|---------------------|------------------|---------------|
134146
| Vision Process Time | 5.76 s | 2.91 s |
135147
| Prefill Speed | 192.28 tok/s | 266.13 tok/s |
136148
| Decode Speed | 34.73 tok/s | 44.96 tok/s |
137149

138-
The prefill speed describes how fast the model processes the input prompt. The decode speed corresponds to the rate at which the model generates new tokens after the input is processed
150+
**Prefill speed** describes how fast the model processes the input prompt.
151+
152+
**Decode Speed** indicates how quickly the model generates new tokens after the input is processed.
139153

140-
This shows the advantages of using Arm optimized kernels for your ViT use-cases.
154+
These benchmarks clearly demonstrate the performance advantages of using Arm-optimized KleidiAI kernels for vision transformer (ViT) workloads.

content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,18 +3,18 @@ title: Vision LLM inference on Android with KleidiAI and MNN
33

44
minutes_to_complete: 30
55

6-
who_is_this_for: This learning path is for developers who want to run Vision Transformers (ViT) efficiently on an Android device.
6+
who_is_this_for: This Learning Path is for developers who want to run Vision Transformers (ViT) efficiently on Android.
77

88
learning_objectives:
9-
- Download the a Vision Large Language Model (LLM) from Hugging Face.
9+
- Download a Vision Large Language Model (LLM) from Hugging Face.
1010
- Convert the model to the Mobile Neural Network (MNN) framework.
11-
- Install an Android demo application with the model to run an inference.
12-
- Compare model inference performance with and without KleidiAI Arm optimized micro-kernels.
11+
- Install an Android demo application using the model to run an inference.
12+
- Compare inference performance with and without KleidiAI Arm-optimized micro-kernels.
1313

1414

1515
prerequisites:
1616
- A development machine with [Android Studio](https://developer.android.com/studio) installed.
17-
- A 64-bit Arm powered smartphone running Android with `i8mm` and `dotprod` supported.
17+
- A 64-bit Arm-powered smartphone running Android with support for `i8mm` and `dotprod`. supported.
1818

1919
author:
2020
- Shuheng Deng
@@ -36,7 +36,7 @@ operatingsystems:
3636

3737
further_reading:
3838
- resource:
39-
title: "MNN : A UNIVERSAL AND EFFICIENT INFERENCE ENGINE"
39+
title: "MNN: A Universal and Efficient Inference Engine"
4040
link: https://arxiv.org/pdf/2002.12418
4141
type: documentation
4242
- resource:

0 commit comments

Comments
 (0)