Skip to content

Commit 7549347

Browse files
author
Amal Augustine Jose
committed
Update to ALP: Vision LLM inference on Android with KleidiAI and MNN
1. Fixed inconsistency of using both Qwen2.5-VL-3B and Qwen2-VL-2B, which could confuse the user. 2. Fixed the broken optional section on how to quantize a model locally. Two original issues: a. It used Qwen2-VL-2B while the pre-quantized model was Qwen2.5-VL-3B. b. Even with the wrong model, the steps resulted in a model that produced gibberish. c. The following changes were made to get local quantization working: i. Switched from symmetric quantization → asymmetric quantization. ii. Switched from per-channel quantization → block quantization (block size = 64). d. Updated deprecated huggingface-cli download -> hf download. 3. Updated to use llmexport from PyPI rather than the Git repo (which has since been updated). 4. Added a recommendation to use a Python virtual environment. 5. Added a wrapper recommendation as a workaround for an issue with llmexport.
1 parent 2254ab3 commit 7549347

File tree

3 files changed

+38
-12
lines changed

3 files changed

+38
-12
lines changed

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md

Lines changed: 36 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,12 @@ pip 24.0 from /usr/lib/python3/dist-packages/pip (python 3.12)
4646
If Python 3.x is not the default version, try running `python3 --version` and `pip3 --version`.
4747
{{% /notice %}}
4848

49+
It's reccommended to do the changes in a python virtual environment
50+
```output
51+
python3.12 -m venv vision_llm
52+
source vision_llm/bin/activate
53+
```
54+
4955
## Set up Phone Connection
5056

5157
You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files.
@@ -72,7 +78,7 @@ The pre-quantized model is available in Hugging Face, you can download with the
7278
```bash
7379
git lfs install
7480
git clone https://huggingface.co/taobao-mnn/Qwen2.5-VL-3B-Instruct-MNN
75-
git checkout 9057334b3f85a7f106826c2fa8e57c1aee727b53
81+
git checkout a4622194b3c518139e2cb8099e147e3d71975f7a
7682
```
7783

7884
## (Optional) Download and Convert the Model
@@ -81,28 +87,48 @@ If you need to quantize the model with customized parameter, the following comma
8187
```bash
8288
cd $HOME
8389
pip install -U huggingface_hub
84-
huggingface-cli download Qwen/Qwen2-VL-2B-Instruct --local-dir ./Qwen2-VL-2B-Instruct/
85-
git clone https://github.com/wangzhaode/llm-export
86-
cd llm-export && pip install .
90+
hf download Qwen/Qwen2.5-VL-3B-Instruct --local-dir ./Qwen2.5-VL-3B-Instruct/
91+
pip install llmexport
92+
```
93+
Use the `llmexport` to quantize the model with these options:
94+
95+
```bash
96+
llmexport --path ../Qwen2.5-VL-3B-Instruct/ --export mnn --quant_bit 4 \
97+
--quant_block 64 --dst_path Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock
8798
```
88-
Use the `llm-export` repository to quantize the model with these options:
8999

100+
{{% notice Note %}}
101+
if you run into issues where llmexport is not able to access utils, try the following
90102
```bash
91-
llmexport --path ../Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 \
92-
--quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym
103+
# From your project dir (inside the venv)
104+
cat > llmexport_fixed.py <<'PY'
105+
import sys, importlib
106+
# make "utils" resolve to "llmexport.utils"
107+
sys.modules.setdefault("utils", importlib.import_module("llmexport.utils"))
108+
109+
from llmexport.__main__ import main
110+
if __name__ == "__main__":
111+
main()
112+
PY
113+
114+
# Use this instead of the entrypoint:
115+
python llmexport_fixed.py \
116+
--path Qwen2.5-VL-3B-Instruct \
117+
--export mnn --quant_bit 4 --quant_block 64 \
118+
--dst_path Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock
93119
```
120+
{{% /notice %}}
94121

95122
The table below gives you an explanation of the different arguments:
96123

97124
| Parameter | Description | Explanation |
98125
|------------------|-------------|--------------|
99126
| `--quant_bit` | MNN quant bit, 4 or 8, default is 4. | `4` represents q4 quantization. |
100-
| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `128` represents 128 per-block quantization. |
101-
| `--sym` | Symmetric quantization (without zeropoint); default is False. | The quantization parameter that enables symmetrical quantization. |
127+
| `--quant_block` | MNN quant block, default is 0. | `0` represents per-channel quantization; `64` represents 64 per-block quantization. |
102128

103129
To learn more about the parameters, see the [transformers README.md](https://github.com/alibaba/MNN/tree/master/transformers).
104130

105-
Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size.
131+
Verify that the model was built correctly by checking that the `Qwen2.5-VL-3B-Instruct-convert-4bit-64qblock` directory is at least 2GB in size.
106132

107133
## Push the model to Android device
108134

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Run the following commands to clone the MNN repository and checkout the source t
2929
cd $HOME
3030
git clone https://github.com/alibaba/MNN.git
3131
cd MNN
32-
git checkout 282cebeb785118865b9c903decc4b5cd98d5025e
32+
git checkout a739ea5870a4a45680f0e36ba9662ca39f2f4eec
3333
```
3434

3535
Create a build directory and run the build script.

content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/background.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ MNN is a high-performance, lightweight deep learning framework designed for both
1212

1313
**MNN-LLM** is a large language model (LLM) runtime solution built on the MNN engine. It enables local deployment of LLMs across diverse platforms, including mobile devices, PCs, and IoT systems, and supports leading models such as Qianwen, Baichuan, Zhipu, and Llama for efficient, accessible AI-powered experiences.
1414

15-
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework to enhance the inference performance of LLMs. In this Learning Path, the Android app demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen Vision 2B](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct) model.
15+
KleidiAI, a collection of optimized AI micro-kernels, is integrated into the MNN framework to enhance the inference performance of LLMs. In this Learning Path, the Android app demonstrates Vision Transformer inference using the MNN framework. You will use KleidiAI to speed up inference for the [Qwen2.5 Vision 3B](https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct) model.
1616

1717
## Vision Transformer (ViT)
1818
The Vision Transformer (ViT) is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs) that use convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP).

0 commit comments

Comments
 (0)