Skip to content

Commit 7ba390c

Browse files
authored
Merge pull request #1651 from HenryDen/ViT_MNN_submit
LearningPath - Vision LLM inference on Android with KleidiAI and MNN
2 parents 6da345f + 45c35b8 commit 7ba390c

File tree

10 files changed

+265
-0
lines changed

10 files changed

+265
-0
lines changed
Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
title: Build the MNN Android Demo with GUI
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Set up development environment
10+
In this learning path, you will learn how to build and deploy a Vision Transformer(ViT) chat app to an Android device using MNN-LLM. You will learn how to build the MNN-LLM and how to run the Qwen model for the Android application.
11+
12+
The first step is to prepare a development environment with the required software:
13+
14+
- Android Studio (latest version recommended)
15+
- Android NDK (tested with version 28.0.12916984)
16+
- CMake (4.0.0-rc1)
17+
- Python3 (Optional)
18+
- Git
19+
20+
## Clone MNN repo
21+
Open up a Windows PowerShell or Git Bash and checkout the source tree:
22+
23+
```shell
24+
cd C:\Users\$env:USERNAME
25+
git clone https://github.com/HenryDen/MNN.git
26+
cd MNN
27+
git checkout 83b650fc8888d7ccd38dbc68330a87d048b9fe7a
28+
```
29+
30+
{{% notice Note %}}
31+
The app code is currently not merged into the MNN repo. The repo above is a fork from the MNN.
32+
{{% /notice %}}
33+
34+
## Build the app using Android Studio
35+
36+
Create a signing.gradle file at android/app with the following template:
37+
```shell
38+
ext{
39+
signingConfigs = [
40+
release: [
41+
storeFile: file('PATH_TO_jks_file'),
42+
storePassword: "****",
43+
keyAlias: "****",
44+
keyPassword: "****"
45+
]
46+
]
47+
}
48+
```
49+
50+
If you don't need to compile a release version of the app, you can skip the following step of creating a sign file and write anything in the signing.gradle.
51+
52+
- Navigate to **Build -> Generate Signed App Bundle or APK**.
53+
- Select **APK** and click **next**.
54+
- Press **Create new** and fill in the information..
55+
- Fill in the information of the newly generated JKS file in the template above.
56+
57+
Open the MNN/transformers/llm/engine/android directory with Android Studio and wait for the Gradle project sync to finish.
58+
59+
## Prepare the model
60+
You can download the model from ModelScope : https://www.modelscope.cn/models/qwen/qwen2-vl-2b-instruct
61+
62+
Or Hugging Face : https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
63+
64+
If you need to test other vision transformer models, you can download models from https://modelscope.cn/organization/qwen?tab=model and convert them to MNN format.
65+
66+
```shell
67+
// make sure install git lfs
68+
$ git lfs install
69+
$ git clone https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
70+
// install llm-export
71+
$ git clone https://github.com/wangzhaode/llm-export && cd llm-export/
72+
$ pip install .
73+
// CONVERT model
74+
$ llmexport --path /path/to/mnn-llm/Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 --quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym
75+
```
76+
77+
- --quant_bit: the quantization parameter, for example 4 is the q4 quantization
78+
- --quant_block: the quantization parameter, for example 0 is per channel quantization, 128 is 128 per block quantization
79+
- --sym: the quantization parameter, means symmetrical quantization.
80+
81+
## Build and run the app
82+
Before launching the app, you need to push the model into the device manually:
83+
84+
```shell
85+
$ adb shell mkdir /data/local/tmp/models/
86+
$ adb push <path to the model folder> /data/local/tmp/models
87+
```
88+
89+
When you select Run, the build will be executed, and then the app will be copied and installed on the Android device.
90+
91+
After opening the app, you will see:
92+
93+
![Loading screenshot](Loading_page.png)
94+
95+
After the Model is loaded, you can chat with the APP.
96+
97+
![Loading screenshot](chat2.png)
98+
99+
100+
101+
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
---
2+
title: Build the MNN Command-line ViT Demo
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Set up development environment
10+
In this learning path, you will learn how to build and deploy a Vision Transformer(ViT) chat command line Demo to an Android device using MNN-LLM. You will learn how to build the MNN-LLM with cross-compile and how to run the Qwen model for the Android application.
11+
12+
The first step is to prepare a development environment with the required software:
13+
14+
- Linux ubuntu (20.04 or higher)
15+
- Android NDK (tested with version 28.0.12916984)
16+
- CMake (4.0.0-rc1)
17+
- Python3 (Optional)
18+
- Git
19+
20+
## Build and run command-line demo
21+
22+
Push the Model to device, how to obtain model is mention on last page.
23+
```shell
24+
$ adb shell mkdir /data/local/tmp/models/
25+
$ adb push <path to the model folder> /data/local/tmp/models
26+
```
27+
28+
```shell
29+
# Download a ndk file from https://developer.android.com/ndk/downloads/
30+
$ upzip android-ndk-r27d-linux.zip
31+
$ export ANDROID_NDK=./android-ndk-r27d-linux/
32+
33+
$ git clone https://github.com/alibaba/MNN.git
34+
% cd MNN/project/android
35+
$ mkdir build_64 && cd build_64
36+
$ ../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=true -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
37+
$ adb push *so llm_demo tools/cv/*so /data/local/tmp/
38+
$ adb shell
39+
```
40+
41+
Here switch to android adb shell environment.
42+
43+
```shell
44+
$ cd /data/local/tmp/
45+
$ chmod +x llm_demo
46+
$ export LD_LIBRARY_PATH=./
47+
# <img>./example.png</img> get your image here
48+
$ echo " <img>./example.png</img>Describe the content of the image." >prompt
49+
$ ./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
50+
```
51+
52+
Here is an example image:
53+
54+
![example image](example.png)
55+
56+
If the launch is success, you can see the output
57+
58+
```shell
59+
config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json
60+
tokenizer_type = 3
61+
prompt file is prompt
62+
The image features a tiger standing in a grassy field, with its front paws raised and its eyes fixed on something or someone behind it. The tiger's stripes are clearly visible against the golden-brown background of the grass. The tiger appears to be alert and ready for action, possibly indicating a moment of tension or anticipation in the scene.
63+
64+
#################################
65+
prompt tokens num = 243
66+
decode tokens num = 70
67+
vision time = 5.96 s
68+
audio time = 0.00 s
69+
prefill time = 1.80 s
70+
decode time = 2.09 s
71+
prefill speed = 135.29 tok/s
72+
decode speed = 33.53 tok/s
73+
##################################
74+
```
75+
34.2 KB
Loading
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Vision LLM inference on Android with KleidiAI and MNN
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This is an advanced topic for Android developers who want to efficiently run Vision-Transformer(ViT) on android device.
7+
8+
learning_objectives:
9+
- Run Vision-Transformer inference on an Android device with the Qwen Vision 2B model using the MNN inference framework.
10+
- Download and Convert a Qwen Vision model from Hugging Face.
11+
12+
prerequisites:
13+
- A x86_64 development machine with Android Studio installed.
14+
- A 64-bit Arm powered smartphone running Android with i8mm/dotprod supported.
15+
16+
author: Shuheng Deng,Arm
17+
18+
### Tags
19+
skilllevels: Introductory
20+
subjects: ML
21+
armips:
22+
- Cortex-A
23+
- Cortex-X
24+
tools_software_languages:
25+
- Android Studio
26+
- KleidiAI
27+
operatingsystems:
28+
- Android
29+
30+
31+
32+
further_reading:
33+
- resource:
34+
title: "MNN : A UNIVERSAL AND EFFICIENT INFERENCE ENGINE"
35+
link: https://arxiv.org/pdf/2002.12418
36+
type: documentation
37+
- resource:
38+
title: MNN-Doc
39+
link: https://mnn-docs.readthedocs.io/en/latest/
40+
type: blog
41+
- resource:
42+
title: Vision transformer
43+
link: https://en.wikipedia.org/wiki/Vision_transformer
44+
type: website
45+
46+
47+
48+
### FIXED, DO NOT MODIFY
49+
# ================================================================================
50+
weight: 1 # _index.md always has weight of 1 to order correctly
51+
layout: "learningpathall" # All files under learning paths have this same wrapper
52+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
53+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
---
2+
title: Background
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## MNN Introduction
10+
MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models and has industry-leading performance for inference and training on-device. At present, MNN has been integrated into more than 30 apps of Alibaba Inc, such as Taobao, Tmall, Youku, DingTalk, Xianyu, etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. In addition, MNN is also used on embedded devices, such as IoT.
11+
12+
MNN-LLM is a large language model runtime solution developed based on the MNN engine. The mission of this project is to deploy LLM models locally on everyone's platforms(Mobile Phone/PC/IOT). It supports popular large language models such as Qianwen, Baichuan, Zhipu, LLAMA, and others.
13+
14+
KleidiAI is currently integrated into the MNN framework, enhancing the inference performance of large language models (LLMs) within MNN. The Android app on this page demonstrates Vision Transformer inference using the MNN framework, accelerated by KleidiAI.
15+
16+
## Vision Transformer(ViT)
17+
The Vision Transformer (ViT) is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs), which process images using convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP).
18+
The Vit workflow contains:
19+
20+
- **Image Patching** : The input image is divided into fixed-size patches, similar to how text is tokenized in NLP tasks.
21+
- **Linear Embedding** : Each image patch is flattened and linearly embedded into a vector.
22+
- **Position Encoding** : Positional information is added to the patch embeddings to retain spatial information.
23+
- **Transformer Encoder** : The embedded patches are fed into a standard transformer encoder, which uses self-attention mechanisms to process the patches and capture relationships between them.
24+
- **Classification** : The output of the transformer encoder is used for image classification or other vision tasks.
25+
26+
ViT has shown competitive performance on various image classification benchmarks and has been widely adopted in computer vision research
27+
28+
197 KB
Loading
104 KB
Loading
61.7 KB
Loading
210 KB
Loading

0 commit comments

Comments
 (0)