Merge pull request #1961 from HenryDen/main

pareenaverma · web-flow · commit 06b35ededd0d · 2025-05-15T08:05:34.000-05:00
Update the Vision llm
diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/1-devenv-and-model.md
@@ -1,5 +1,5 @@
 ---
-title: Build the MNN Android Demo with GUI
+title: Environment setup and prepare model
 weight: 3
 
 ### FIXED, DO NOT MODIFY
@@ -9,7 +9,7 @@ layout: learningpathall
 
 In this section, you'll set up your development environment by installing dependencies and preparing the Qwen vision model.
 
-Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `3.31.6`.
+Install the Android NDK (Native Development Kit) and git-lfs. This Learning Path was tested with NDK version `28.0.12916984` and CMake version `4.0.0-rc1`.
 
 For Ubuntu or Debian systems, install CMake and git-lfs with the following commands:
 
@@ -18,9 +18,9 @@ sudo apt update
 sudo apt install cmake git-lfs -y
 ```
 
-You can use Android Studio to obtain the NDK.
+You can use Android Studio to obtain the NDK. 
 
-Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab.
+Click **Tools > SDK Manager** and navigate to the **SDK Tools** tab. 
 
 Select the **NDK (Side by side)** and **CMake** checkboxes, as shown below:
 
@@ -48,7 +48,7 @@ If Python 3.x is not the default version, try running `python3 --version` and `p
 
 ## Set up Phone Connection
 
-You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files.
+You need to set up an authorized connection with your phone. The Android SDK Platform Tools package, included with Android Studio, provides Android Debug Bridge (ADB) for transferring files. 
 
 Connect your phone to your computer using a USB cable, and enable USB debugging on your phone. To do this, tap the **Build Number** in your **Settings** app 7 times, then enable **USB debugging** in **Developer Options**.
 
@@ -65,9 +65,18 @@ List of devices attached
 <DEVICE ID>     device
 ```
 
-## Download and Convert the Model
+## Download the quantized Model
 
-The following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework.
+The pre-quantized model is available in Hugging Face, you can download with the following command:
+
+```bash
+git lfs install
+git clone https://huggingface.co/taobao-mnn/Qwen2.5-VL-3B-Instruct-MNN
+git checkout 9057334b3f85a7f106826c2fa8e57c1aee727b53
+```
+
+## (Optional) Download and Convert the Model
+If you need to quantize the model with customized parameter, the following commands download the model from Hugging Face, and clone a tool for exporting the LLM model to the MNN framework.
 
 ```bash
 cd $HOME
@@ -95,11 +104,13 @@ To learn more about the parameters, see the [transformers README.md](https://git
 
 Verify that the model was built correctly by checking that the `Qwen2-VL-2B-Instruct-convert-4bit-per_channel` directory is at least 1 GB in size.
 
+## Push the model to Android device
+
 Push the model onto the device:
 
 ```shell
 adb shell mkdir /data/local/tmp/models/
-adb push Qwen2-VL-2B-Instruct-convert-4bit-per_channel /data/local/tmp/models
+adb push Qwen2.5-VL-3B-Instruct-MNN /data/local/tmp/models
 ```
 
-With the model set up, you're ready to use Android Studio to build and run an example application.
+With the model set up, you're ready to build and run an example application.
diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-benchmark.md
@@ -1,15 +1,15 @@
 ---
 title: Build the MNN Command-line ViT Demo
-weight: 5
+weight: 4
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 ## Prepare an Example Image
 
-In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts.
+In this section, you'll benchmark model performance with and without KleidiAI kernels. To run optimized inference, you'll first need to compile the required library files. You'll also need an example image to run command-line prompts. 
 
-You can use the provided image of the tiger below that this Learning Path uses, or choose your own.
+You can use the provided image of the tiger below that this Learning Path uses, or choose your own. 
 
 Whichever you select, rename the image to `example.png` to use the commands in the following sections.
 
@@ -23,24 +23,30 @@ adb push example.png /data/local/tmp/
 
 ## Build Binaries for Command-line Inference
 
-Navigate to the Vision Language Models project that you cloned in the previous section.
+Run the following commands to clone the MNN repository and checkout the source tree:
+
+```bash
+cd $HOME
+git clone https://github.com/alibaba/MNN.git
+cd MNN
+git checkout 282cebeb785118865b9c903decc4b5cd98d5025e
+```
+
+Create a build directory and run the build script. 
 
 The first time that you do this, build the binaries with the `-DMNN_KLEIDIAI` flag set to `FALSE`.
 
 ```bash
-cmake  ./vit/ -B build \
--DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
--DCMAKE_BUILD_TYPE=Release \
--DANDROID_ABI="arm64-v8a" \
--DANDROID_STL=c++_static  \
--DANDROID_NATIVE_API_LEVEL=android-21  \
--DMNN_BUILD_OPENCV=true \
--DMNN_IMGCODECS=true \
--DMNN_KLEIDIAI=false
-cmake --build ./build
+cd $HOME/MNN/project/android
+mkdir build_64 && cd build_64
+
+../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=FALSE  \
+  -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
+  -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
+  -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
 ```
 {{% notice Note %}}
-If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running above commands:
+If your NDK toolchain isn't set up correctly, you might run into issues with the above script. Make a note of where the NDK was installed - this will be a directory named after the version you downloaded earlier. Try exporting the following environment variables before re-running `build_64.sh`:
 
 ```bash
 export ANDROID_NDK_HOME=<path-to>/ndk/28.0.12916984
@@ -55,23 +61,23 @@ export ANDROID_NDK=$ANDROID_NDK_HOME
 Push the required files to your Android device, then enter a shell on the device using ADB:
 
 ```bash
-adb push build/bin/vision_llm build/lib/*.so /data/local/tmp
+adb push *so llm_demo tools/cv/*so /data/local/tmp/
 adb shell
 ```
 
 Run the following commands in the ADB shell. Navigate to the directory you pushed the files to, add executable permissions to the `llm_demo` file and export an environment variable for it to run properly. After this, use the example image you transferred earlier to create a file containing the text content for the prompt.
 
 ```bash
 cd /data/local/tmp/
-chmod +x vision_llm
+chmod +x llm_demo
 export LD_LIBRARY_PATH=$PWD
 echo "<img>./example.png</img>Describe the content of the image." > prompt
 ```
 
 Finally, run an inference on the model with the following command:
 
 ```bash
-./vision_llm models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
+./llm_demo models/Qwen2.5-VL-3B-Instruct-MNN/config.json prompt
 ```
 
 If the launch is successful, you should see the following output, with the performance benchmark at the end:
@@ -96,28 +102,22 @@ prefill speed = 192.28 tok/s
 
 ## Enable KleidiAI and Re-run Inference
 
-The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`.
+The next step is to re-generate the binaries with KleidiAI activated. This is done by updating the flag `-DMNN_KLEIDIAI` to `TRUE`. 
 
-From the `build` directory, run:
+From the `build_64` directory, run:
 ```bash
-cmake  ./vit/ -B build \
--DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
--DCMAKE_BUILD_TYPE=Release \
--DANDROID_ABI="arm64-v8a" \
--DANDROID_STL=c++_static  \
--DANDROID_NATIVE_API_LEVEL=android-21  \
--DMNN_BUILD_OPENCV=true \
--DMNN_IMGCODECS=true \
--DMNN_KLEIDIAI=false
-cmake --build ./build
+../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=TRUE \
+-DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true \
+-DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true \
+-DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
 ```
 ## Update Files on the Device
 
 First, remove existing binaries from your Android device, then push the updated files:
 
 ```bash
-adb shell "cd /data/local/tmp; rm -rf *so vision_llm"
-adb push build/bin/vision_llm build/lib/*.so /data/local/tmp
+adb shell "cd /data/local/tmp; rm -rf *so llm_demo tools/cv/*so"
+adb push *so llm_demo tools/cv/*so /data/local/tmp/
 adb shell
 ```
 
@@ -127,7 +127,7 @@ With the new ADB shell, run the following commands:
 cd /data/local/tmp/
 chmod +x llm_demo
 export LD_LIBRARY_PATH=$PWD
-./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt
+./llm_demo models/Qwen2.5-VL-3B-Instruct-MNN/config.json prompt
 ```
 ## Benchmark Results
 
@@ -154,7 +154,7 @@ This time, you should see an improvement in the benchmark. Below is an example t
 | Prefill Speed       | 192.28 tok/s     | 266.13 tok/s  |
 | Decode Speed        | 34.73 tok/s      | 44.96 tok/s   |
 
-**Prefill speed** describes how fast the model processes the input prompt.
+**Prefill speed** describes how fast the model processes the input prompt. 
 
 **Decode Speed** indicates how quickly the model generates new tokens after the input is processed.
 
diff --git a/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-generate-apk.md b/content/learning-paths/mobile-graphics-and-gaming/vision-llm-inference-on-android-with-kleidiai-and-mnn/2-generate-apk.md