ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Build_the_MNN_Android_Demo_with_GUI.md‎
Lines changed: 101 additions & 0 deletions b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Build_the_MNN_Android_Demo_with_GUI.md‎
Lines changed: 101 additions & 0 deletions
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Build_the_MNN_Command-line_ViT_Demo.md‎
Lines changed: 75 additions & 0 deletions b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Build_the_MNN_Command-line_ViT_Demo.md‎
Lines changed: 75 additions & 0 deletions
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Loading_page.png‎
34.2 KB b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/Loading_page.png‎
34.2 KB
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md‎
Lines changed: 53 additions & 0 deletions b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_index.md‎
Lines changed: 53 additions & 0 deletions
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_next-steps.md‎
Lines changed: 8 additions & 0 deletions b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/_next-steps.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md‎
Lines changed: 28 additions & 0 deletions b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/background.md‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/chat2.png‎
197 KB b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/chat2.png‎
197 KB
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/chat_page.png‎
104 KB b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/chat_page.png‎
104 KB
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/example-picture.png‎
61.7 KB b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/example-picture.png‎
61.7 KB
diff --git a/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/example.png‎
210 KB b/‎content/learning-paths/mobile-graphics-and-gaming/Vision-LLM-inference-on-Android-with-KleidiAI-and-MNN/example.png‎
210 KB
@@ -0,0 +1,101 @@
+---
+title: Build the MNN Android Demo with GUI
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Set up development environment
+In this learning path, you will learn how to build and deploy a Vision Transformer(ViT) chat app to an Android device using MNN-LLM. You will learn how to build the MNN-LLM and how to run the Qwen model for the Android application.
+
+The first step is to prepare a development environment with the required software:
+
+- Android Studio (latest version recommended)
+- Android NDK (tested with version 28.0.12916984)
+- CMake (4.0.0-rc1)
+- Python3 (Optional)
+- Git
+
+## Clone MNN repo
+Open up a Windows PowerShell or Git Bash and checkout the source tree:
+
+```shell
+cd C:\Users\$env:USERNAME
+git clone https://github.com/HenryDen/MNN.git
+cd MNN
+git checkout 83b650fc8888d7ccd38dbc68330a87d048b9fe7a
+```
+
+{{% notice Note %}}
+The app code is currently not merged into the MNN repo. The repo above is a fork from the MNN. 
+{{% /notice %}}
+
+## Build the app using Android Studio
+
+Create a signing.gradle file at android/app with the following template: 
+```shell
+ext{
+    signingConfigs = [
+        release: [
+            storeFile: file('PATH_TO_jks_file'),
+            storePassword: "****",
+            keyAlias: "****",
+            keyPassword: "****"
+        ]
+    ]
+}
+```
+
+If you don't need to compile a release version of the app, you can skip the following step of creating a sign file and write anything in the signing.gradle.
+
+- Navigate to **Build -> Generate Signed App Bundle or APK**.
+- Select **APK** and click **next**.
+- Press **Create new** and  fill in the information..
+- Fill in the information of the newly generated JKS file in the template above.
+
+Open the MNN/transformers/llm/engine/android directory with Android Studio and wait for the Gradle project sync to finish.
+
+## Prepare the model
+You can download the model from ModelScope : https://www.modelscope.cn/models/qwen/qwen2-vl-2b-instruct
+
+Or Hugging Face : https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
+
+If you need to test other vision transformer models, you can download models from  https://modelscope.cn/organization/qwen?tab=model and convert them to MNN format.
+
+```shell
+// make sure install git lfs
+$ git lfs install
+$ git clone https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
+// install llm-export
+$ git clone https://github.com/wangzhaode/llm-export && cd llm-export/
+$ pip install .
+// CONVERT model
+$ llmexport --path /path/to/mnn-llm/Qwen2-VL-2B-Instruct/ --export mnn --quant_bit 4 --quant_block 0 --dst_path Qwen2-VL-2B-Instruct-convert-4bit-per_channel --sym
+```
+
+- --quant_bit: the quantization parameter, for example 4 is the q4 quantization
+- --quant_block: the quantization parameter, for example 0 is per channel quantization, 128 is 128 per block quantization
+- --sym: the quantization parameter, means symmetrical quantization.
+
+## Build and run the app
+Before launching the app, you need to push the model into the device manually:
+
+```shell
+$ adb shell mkdir /data/local/tmp/models/
+$ adb push <path to the model folder> /data/local/tmp/models
+```
+
+When you select Run, the build will be executed, and then the app will be copied and installed on the Android device. 
+
+After opening the app, you will see:
+
+![Loading screenshot](Loading_page.png)
+
+After the Model is loaded, you can chat with the APP.
+
+![Loading screenshot](chat2.png)
+
+
+
+
@@ -0,0 +1,75 @@
+---
+title: Build the MNN Command-line ViT Demo
+weight: 4
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Set up development environment
+In this learning path, you will learn how to build and deploy a Vision Transformer(ViT) chat command line Demo to an Android device using MNN-LLM. You will learn how to build the MNN-LLM with cross-compile and how to run the Qwen model for the Android application.
+
+The first step is to prepare a development environment with the required software:
+
+- Linux ubuntu (20.04 or higher)
+- Android NDK (tested with version 28.0.12916984)
+- CMake (4.0.0-rc1)
+- Python3 (Optional)
+- Git
+
+## Build and run command-line demo
+
+Push the Model to device, how to obtain model is mention on last page.
+```shell
+$ adb shell mkdir /data/local/tmp/models/
+$ adb push <path to the model folder> /data/local/tmp/models
+``` 
+
+```shell
+# Download a ndk file from https://developer.android.com/ndk/downloads/
+$ upzip android-ndk-r27d-linux.zip
+$ export ANDROID_NDK=./android-ndk-r27d-linux/
+
+$ git clone https://github.com/alibaba/MNN.git
+% cd MNN/project/android
+$ mkdir build_64 && cd build_64
+$ ../build_64.sh "-DMNN_LOW_MEMORY=true -DLLM_SUPPORT_VISION=true -DMNN_KLEIDIAI=true  -DMNN_CPU_WEIGHT_DEQUANT_GEMM=true -DMNN_BUILD_LLM=true -DMNN_SUPPORT_TRANSFORMER_FUSE=true -DMNN_ARM82=true -DMNN_OPENCL=true -DMNN_USE_LOGCAT=true -DMNN_IMGCODECS=true -DMNN_BUILD_OPENCV=true"
+$ adb push *so llm_demo tools/cv/*so /data/local/tmp/
+$ adb shell
+```
+
+Here switch to android adb shell environment.
+
+```shell
+$ cd /data/local/tmp/
+$ chmod +x llm_demo
+$ export LD_LIBRARY_PATH=./   
+# <img>./example.png</img> get your image here
+$ echo " <img>./example.png</img>Describe the content of the image." >prompt  
+$ ./llm_demo models/Qwen-VL-2B-convert-4bit-per_channel/config.json prompt  
+```
+
+Here is an example image: 
+
+![example image](example.png)
+
+If the launch is success, you can see the output
+
+```shell
+config path is models/Qwen-VL-2B-convert-4bit-per_channel/config.json
+tokenizer_type = 3
+prompt file is prompt
+The image features a tiger standing in a grassy field, with its front paws raised and its eyes fixed on something or someone behind it. The tiger's stripes are clearly visible against the golden-brown background of the grass. The tiger appears to be alert and ready for action, possibly indicating a moment of tension or anticipation in the scene.
+
+#################################
+prompt tokens num = 243
+decode tokens num = 70
+ vision time = 5.96 s
+  audio time = 0.00 s
+prefill time = 1.80 s
+ decode time = 2.09 s
+prefill speed = 135.29 tok/s
+ decode speed = 33.53 tok/s
+##################################
+```
+
@@ -0,0 +1,53 @@
+---
+title: Vision LLM inference on Android with KleidiAI and MNN
+
+minutes_to_complete: 30
+
+who_is_this_for: This is an advanced topic for Android developers who want to efficiently run Vision-Transformer(ViT) on android device.
+
+learning_objectives: 
+    - Run Vision-Transformer inference on an Android device with the Qwen Vision 2B model using the MNN inference framework.
+    - Download and Convert a Qwen Vision model from Hugging Face.
+
+prerequisites:
+    - A x86_64 development machine with Android Studio installed.
+    - A 64-bit Arm powered smartphone running Android with i8mm/dotprod supported.
+
+author: Shuheng Deng,Arm
+
+### Tags
+skilllevels: Introductory
+subjects: ML
+armips:
+    - Cortex-A 
+    - Cortex-X
+tools_software_languages:
+    - Android Studio
+    - KleidiAI
+operatingsystems:
+    - Android
+
+
+
+further_reading:
+    - resource:
+        title: "MNN : A UNIVERSAL AND EFFICIENT INFERENCE ENGINE"
+        link: https://arxiv.org/pdf/2002.12418
+        type: documentation
+    - resource:
+        title: MNN-Doc
+        link: https://mnn-docs.readthedocs.io/en/latest/
+        type: blog
+    - resource:
+        title: Vision transformer
+        link: https://en.wikipedia.org/wiki/Vision_transformer
+        type: website
+
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---
@@ -0,0 +1,28 @@
+---
+title: Background
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## MNN Introduction
+MNN is a highly efficient and lightweight deep learning framework. It supports inference and training of deep learning models and has industry-leading performance for inference and training on-device. At present, MNN has been integrated into more than 30 apps of Alibaba Inc, such as Taobao, Tmall, Youku, DingTalk, Xianyu, etc., covering more than 70 usage scenarios such as live broadcast, short video capture, search recommendation, product searching by image, interactive marketing, equity distribution, security risk control. In addition, MNN is also used on embedded devices, such as IoT.
+
+MNN-LLM is a large language model runtime solution developed based on the MNN engine. The mission of this project is to deploy LLM models locally on everyone's platforms(Mobile Phone/PC/IOT). It supports popular large language models such as Qianwen, Baichuan, Zhipu, LLAMA, and others. 
+
+KleidiAI is currently integrated into the MNN framework, enhancing the inference performance of large language models (LLMs) within MNN. The Android app on this page demonstrates Vision Transformer inference using the MNN framework, accelerated by KleidiAI.
+
+## Vision Transformer（ViT）
+The Vision Transformer (ViT) is a deep learning model designed for image recognition tasks. Unlike traditional convolutional neural networks (CNNs), which process images using convolutional layers, ViT leverages the transformer architecture originally developed for natural language processing (NLP).
+The Vit workflow contains:
+
+- **Image Patching** : The input image is divided into fixed-size patches, similar to how text is tokenized in NLP tasks.
+- **Linear Embedding** : Each image patch is flattened and linearly embedded into a vector.
+- **Position Encoding** : Positional information is added to the patch embeddings to retain spatial information.
+- **Transformer Encoder** : The embedded patches are fed into a standard transformer encoder, which uses self-attention mechanisms to process the patches and capture relationships between them.
+- **Classification** : The output of the transformer encoder is used for image classification or other vision tasks.
+
+ViT has shown competitive performance on various image classification benchmarks and has been widely adopted in computer vision research
+
+