Skip to content

Commit f6eb493

Browse files
authored
Merge pull request #2322 from amalaugustinejose/vision-llm-inference-on-android-with-kleidiai-and-mnn
Update to ALP: Vision LLM inference on Android with KleidiAI and MNN
2 parents 2254ab3 + 29720ca commit f6eb493

File tree

3 files changed

+39
-12
lines changed

3 files changed

+39
-12
lines changed

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,13 @@ pip 24.0 from /usr/lib/python3/dist-packages/pip (python 3.12)
4646
If Python 3.x is not the default version, try running `python3 --version` and `pip3 --version`.
4747
{{% /notice %}}
4848

49+
It is recommended to use a python virtual environment:
50+
51+
```bash
52+
python3.12 -m venv vision_llm
53+
source vision_llm/bin/activate
54+
```
55+
4956
## Set up Phone Connection
5057

5158
You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files.
@@ -72,7 +79,7 @@ The pre-quantized model is available in Hugging Face, you can download with the
7279
```bash
7380
git lfs install
7481
git clone https://huggingface.co/taobao-mnn/Qwen2.5-VL-3B-Instruct-MNN
75-
git checkout 9057334b3f85a7f106826c2fa8e57c1aee727b53
82+
git checkout a4622194b3c518139e2cb8099e147e3d71975f7a
7683
```
7784

7885
## (Optional) Download and Convert the Model
@@ -81,28 +88,48 @@ If you need to quantize the model with customized parameter, the following comma
8188
```bash
8289
cd $HOME
8390
pip install -U huggingface_hub
84-
huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Instruct/
85-
git clone https://github.com/wangzhaode/llm-export
86-
cd llm-export && pip install .
91+
hf download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ./Qwen2.5-VL-3B-Instruct/
92+
pip install llmexport
8793
```
88-
Use the `llm-export` repository to quantize the model with these options:
94+
Use `llmexport` to quantize the model with these options:
8995

9096
```bash
91-
llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \
92-
--quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym
97+
llmexport --path ../Qwen2.5-VL-3B-Instruct/ --export mnn --quant_bit 4 \
98+
--quant_block 64 --dst_path Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock
9399
```
94100

101+
{{% notice Note %}}
102+
If you run into issues where llmexport is not able to access utils, try the following
103+
```bash
104+
# From your project dir (inside the venv)
105+
cat > llmexport_fixed.py <<'PY'
106+
import sys, importlib
107+
# make "utils" resolve to "llmexport.utils"
108+
sys.modules.setdefault("utils", importlib.import_module("llmexport.utils"))
109+
110+
from llmexport.__main__ import main
111+
if __name__ == "__main__":
112+
main()
113+
PY
114+
115+
# Use this instead of the entrypoint:
116+
python llmexport_fixed.py \
117+
--path Qwen2.5-VL-3B-Instruct \
118+
--export mnn --quant_bit 4 --quant_block 64 \
119+
--dst_path Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock
120+
```
121+
{{% /notice %}}
122+
95123
The table below gives you an explanation of the different arguments:
96124

97125
| Parameter | Description | Explanation |
98126
|------------------|-------------|--------------|
99127
| `--quant_bit` | MNN quant bit, 4 or 8, default is 4. | `4` represents q4 quantization. |
100-
| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `128` represents 128 per-block quantization. |
101-
| `--sym` | Symmetric quantization (without zeropoint); default is False. | The quantization parameter that enables symmetrical quantization. |
128+
| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `64` represents 64 per-block quantization. |
102129

103130
To learn more about the parameters, see the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
104131

105-
Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size.
132+
Verify that the model was built correctly by checking that the `Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock` directory is at least 2GB in size.
106133

107134
## Push the model to Android device
108135

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Run the following commands to clone the MNN repository and checkout the source t
2929
cd $HOME
3030
git clone https://github.com/alibaba/MNN.git
3131
cd MNN
32-
git checkout 282cebeb785118865b9c903decc4b5cd98d5025e
32+
git checkout a739ea5870a4a45680f0e36ba9662ca39f2f4eec
3333
```
3434

3535
Create a build directory and run the build script.

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/background.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ MNN is a high-performance, lightweight deep learning framework designed for both
1212

1313
**MNN-LLM** is a large language model (LLM) runtime solution built on the MNN engine. It enables local deployment of LLMs across diverse platforms, including mobile devices, PCs, and IoT systems, and supports leading models such as Qianwen, Baichuan, Zhipu, and Llama for efficient, accessible AI-powered experiences.
1414

15-
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework to enhance the inference performance of LLMs. In this Learning Path, the Android app demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) model.
15+
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework to enhance the inference performance of LLMs. In this Learning Path, the Android app demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen2.5 Vision 3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) model.
1616

1717
## Vision Transformer (ViT)
1818
The Vision Transformer (ViT) is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs) that use convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP).

0 commit comments

Comments
 (0)