Skip to content

Commit c3173a8

Browse files
authored
Merge pull request #1331 from mitsunami/koki/android-onnxruntime
Build an Android chat app with ONNX Runtime Learning Path
2 parents 8541b93 + 595187f commit c3173a8

File tree

9 files changed

+414
-0
lines changed

9 files changed

+414
-0
lines changed
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: Create a development environment
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Set up your development environment
10+
11+
In this Learning Path, you will learn how to build and deploy a simple LLM-based chat app to an Android device using ONNX Runtime. You will learn how to build the ONNX runtime and ONNX Runtime generate() API and how to run the Phi-3 model for the Android application.
12+
13+
The first step is to prepare a development environment with the required software:
14+
15+
- Android Studio (latest version recommended)
16+
- Android NDK (tested with version 27.0.12077973)
17+
- Python 3.11
18+
- CMake (tested with version 3.28.1)
19+
- Ninja (tested with version 1.11.1)
20+
21+
The instructions were tested on an x86 Windows machine with at least 16GB of RAM.
22+
23+
## Install Android Studio and Android NDK
24+
25+
Follow these steps to install and configure Android Studio:
26+
27+
1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/).
28+
29+
2. Navigate to `Tools -> SDK Manager`.
30+
31+
3. In the `SDK Platforms` tab, check `Android 14.0 ("UpsideDownCake")`.
32+
33+
4. In the `SDK Tools` tab, check `NDK (Side by side)`.
34+
35+
5. Click Ok and Apply.
36+
37+
## Install Python 3.11
38+
39+
Download and install [Python version 3.11](https://www.python.org/downloads/release/python-3110/)
40+
41+
## Install CMake
42+
43+
CMake is an open-source tool that automates the build process for software projects, helping to generate platform-specific build configurations.
44+
45+
[Download and install CMake](https://cmake.org/download/)
46+
47+
{{% notice Note %}}
48+
The instructions were tested with version 3.28.1
49+
{{% /notice %}}
50+
51+
## Install Ninja
52+
53+
Ninja is a minimalistic build system designed to efficiently handle incremental builds, particularly in large-scale software projects, by focusing on speed and simplicity.
54+
55+
The Ninja generator needs to be used to build on Windows for Android.
56+
57+
[Download and install Ninja]( https://github.com/ninja-build/ninja/releases)
58+
59+
{{% notice Note %}}
60+
The instructions were tested with version 1.11.1
61+
{{% /notice %}}
62+
63+
You now have the required development tools installed to follow this learning path.
Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: Build ONNX Runtime
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Cross-compile ONNX Runtime for Android CPU
10+
11+
Now that you have your environment set up correctly, you can build the ONNX Runtime inference engine.
12+
13+
ONNX Runtime is an open-source inference engine designed to accelerate the deployment of machine learning models, particularly those in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is optimized for high performance and low latency, making it popular for production deployment of AI models. You can learn more by reading the [ONNX Runtime Overview](https://onnxruntime.ai/).
14+
15+
16+
### Clone onnxruntime repo
17+
18+
Open up a Windows Powershell and checkout the source tree:
19+
20+
```bash
21+
cd C:\Users\$env:USERNAME
22+
git clone --recursive https://github.com/Microsoft/onnxruntime.git
23+
cd onnxruntime
24+
git checkout 9b37b3ea4467b3aab9110e0d259d0cf27478697d
25+
```
26+
27+
{{% notice Note %}}
28+
You might be able to use a later commit. These steps have been tested with the commit `9b37b3ea4467b3aab9110e0d259d0cf27478697d`.
29+
{{% /notice %}}
30+
31+
### Build for Android CPU
32+
33+
The Ninja generator needs to be used to build on Windows. First, set JAVA_HOME to the path to your JDK install. You can point to the JDK from Android Studio, or a standalone JDK install.
34+
35+
```bash
36+
$env:JAVA_HOME="C:\Program Files\Android\Android Studio\jbr"
37+
```
38+
39+
Now run the following command:
40+
41+
```bash
42+
43+
./build.bat --config Release --build_shared_lib --android --android_sdk_path C:\Users\$env:USERNAME\AppData\Local\Android\Sdk --android_ndk_path C:\Users\$env:USERNAME\AppData\Local\Android\Sdk\ndk\27.0.12077973 --android_abi arm64-v8a --android_api 27 --cmake_generator Ninja --build_java
44+
45+
```
46+
47+
Android Archive (AAR) files, which can be imported directly in Android Studio, will be generated by using the above command with `--build_java`
48+
49+
When the build is complete, confirm the shared library and the AAR file have been created:
50+
51+
```
52+
ls build\Windows\Release\onnxruntime.so
53+
ls build\Windows\Release\java\build\android\outputs\aar\onnxruntime-release.aar
54+
```
55+
56+
57+
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: Build ONNX Runtime Generate() API
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Cross-compile the ONNX Runtime generate() API for Android CPU
10+
11+
The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. You can learn more by reading the [ONNX Runtime generate() API page](https://onnxruntime.ai/docs/genai/).
12+
13+
14+
### Clone onnxruntime-genai repo
15+
Within your Windows Powershell prompt, checkout the source repo:
16+
17+
```bash
18+
C:\Users\$env:USERNAME
19+
git clone https://github.com/microsoft/onnxruntime-genai
20+
cd onnxruntime-genai
21+
git checkout 1e4d289502a61265c3b07efb17d8796225bb0b7f
22+
```
23+
24+
{{% notice Note %}}
25+
You might be able to use later commits. These steps have been tested with the commit `1e4d289502a61265c3b07efb17d8796225bb0b7f`.
26+
{{% /notice %}}
27+
28+
### Build for Android CPU
29+
30+
The Ninja generator needs to be used to build on Windows for Android. Make sure JAVA_HOME is set before running the following command:
31+
32+
```bash
33+
python -m pip install requests
34+
python3.11 build.py --build_java --android --android_home C:\Users\$env:USERNAME\AppData\Local\Android\Sdk --android_ndk_path C:\Users\$env:USERNAME\AppData\Local\Android\Sdk\ndk\27.0.12077973 --android_abi arm64-v8a --config Release
35+
```
36+
37+
When the build is complete, confirm the shared library has been created:
38+
39+
```output
40+
ls build\Android\Release\onnxruntime-genai.so
41+
```
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
---
2+
title: Run Benchmark on Android phone
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Run example code for running Phi-3-mini
10+
11+
12+
### Build model runner
13+
14+
You will now cross-compile the model runner to run on Android using the commands below:
15+
16+
``` bash
17+
cd onnxruntime-genai
18+
copy src\ort_genai.h examples\c\include\
19+
copy src\ort_genai_c.h examples\c\include\
20+
cd examples\c
21+
mkdir build
22+
cd build
23+
```
24+
Run the cmake command as shown:
25+
26+
```bash
27+
cmake -DCMAKE_TOOLCHAIN_FILE=C:\Users\$env:USERNAME\AppData\Local\Android\Sdk\ndk\27.0.12077973\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-27 -DCMAKE_BUILD_TYPE=Release -G "Ninja" ..
28+
ninja
29+
```
30+
31+
After successful build, a binary program called `phi3` will be created.
32+
33+
### Prepare phi-3-mini model
34+
35+
Phi-3 ONNX models are hosted on HuggingFace. You can download the Phi-3-mini model with huggingface-cli command:
36+
37+
``` bash
38+
pip install huggingface-hub[cli]
39+
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
40+
```
41+
This command downloads the model into a folder called cpu_and_mobile.
42+
43+
The phi-3-mini (3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. In this learning path, you will use the short context version, which is quantized to 4-bits.
44+
45+
46+
### Run on Android via adb shell
47+
48+
#### Connect your android phone
49+
Connect your phone to your computer using a USB cable.
50+
51+
You need to enable USB debugging on your Android device. You can follow [Configure on-device developer options](https://developer.android.com/studio/debug/dev-options) to enable USB debugging.
52+
53+
Once you have enabled USB debugging and connected via USB, run:
54+
55+
```
56+
adb devices
57+
```
58+
59+
You should see your device listed to confirm it is connected.
60+
61+
#### Copy the runner binary and the model files to the phone
62+
63+
``` bash
64+
adb push cpu-int4-rtn-block-32-acc-level-4 /data/local/tmp
65+
adb push .\phi3 /data/local/tmp
66+
adb push onnxruntime-genai\build\Android\Release\libonnxruntime-genai.so /data/local/tmp
67+
adb push onnxruntime\build\Windows\Release\libonnxruntime.so /data/local/tmp
68+
```
69+
70+
#### Run the model
71+
72+
Use the runner to execute the model on the phone with the `adb` command:
73+
74+
``` bash
75+
adb shell
76+
cd /data/local/tmp
77+
chmod 777 phi3
78+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp
79+
./phi3 cpu-int4-rtn-block-32-acc-level-4
80+
```
81+
82+
This will allow the runner program to load the model, and then it will prompt you to input the text prompt to the model. After you enter yout input prompt, the text output by the model will be displayed. On completion, the performance metrics similar to what is shown below should be displayed:
83+
84+
```
85+
Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps
86+
```
87+
88+
You have successfully run the Phi-3 model on your Android smartphone powered by Arm.
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
title: Build and Run Android chat app
3+
weight: 6
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Build Android chat app
10+
11+
Another way to run the model is to use an Android GUI app.
12+
You can use the Android demo application included in the [onnxruntime-inference-examples repository](https://github.com/microsoft/onnxruntime-inference-examples) to demonstrate local inference.
13+
14+
### Clone the repo
15+
16+
``` bash
17+
git clone https://github.com/microsoft/onnxruntime-inference-examples
18+
cd onnxruntime-inference-examples
19+
git checkout 009920df0136d7dfa53944d06af01002fb63e2f5
20+
```
21+
22+
{{% notice Note %}}
23+
You could probably use a later commit but these steps have been tested with the commit `009920df0136d7dfa53944d06af01002fb63e2f5`.
24+
{{% /notice %}}
25+
26+
### Build the app using Android Studio
27+
28+
Open the `mobile\examples\phi-3\android` directory with Android Studio.
29+
30+
#### (Optional) In case you want to use ONNX Runtime AAR you built
31+
32+
Copy ONNX Runtime AAR you built before if needed:
33+
34+
```bash
35+
Copy onnxruntime\build\Windows\Release\java\build\android\outputs\aar\onnxruntime-release.aar mobile\examples\phi-3\android\app\libs
36+
```
37+
38+
Update `build.gradle.kts (:app)` as below:
39+
40+
``` kotlin
41+
// ONNX Runtime with GenAI
42+
//implementation("com.microsoft.onnxruntime:onnxruntime-android:latest.release")
43+
implementation(files("libs/onnxruntime-release.aar"))
44+
```
45+
46+
After that, click `File`->`Sync Project with Gradle`
47+
48+
#### Build and run the app
49+
50+
When you press Run, the build will be executed, and then the app will be copied and installed on the Android device. This app will automatically download the Phi-3-mini model during the first run. After the download, you can input the prompt in the text box and execute it to run the model.
51+
52+
You should now see a running app on your phone that looks like this:
53+
54+
![App screenshot](screenshot.png)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
---
2+
title: Build an Android chat application with ONNX Runtime API
3+
4+
minutes_to_complete: 60
5+
6+
who_is_this_for: This is an advanced topic for software developers interested in learning how to build an Android chat app with ONNX Runtime and ONNX Runtime Generate() API.
7+
8+
learning_objectives:
9+
- Build ONNX Runtime and ONNX Runtime generate() API for Android.
10+
- Run the Phi-3 model using ONNX Runtime on an Arm-based smartphone.
11+
12+
prerequisites:
13+
- A Windows x86_64 development machine with at least 16GB of RAM. You should also be able to use Linux or MacOS for the build, but the instructions for it have not been included in this learning path.
14+
- An Android phone with at least 8GB of RAM. This learning path was tested on Samsung Galaxy S24.
15+
16+
author_primary: Koki Mitsunami
17+
18+
### Tags
19+
skilllevels: Advanced
20+
subjects: ML
21+
armips:
22+
- Cortex-A
23+
- Cortex-X
24+
tools_software_languages:
25+
- Kotlin
26+
- C++
27+
- ONNX Runtime
28+
- Android
29+
- Mobile
30+
operatingsystems:
31+
- Windows
32+
- Android
33+
34+
35+
### FIXED, DO NOT MODIFY
36+
# ================================================================================
37+
weight: 1 # _index.md always has weight of 1 to order correctly
38+
layout: "learningpathall" # All files under learning paths have this same wrapper
39+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
40+
---
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
next_step_guidance: Now that you are familiar with building LLM applications with ONNX Runtime, you are ready to incorporate LLMs into your Android applications. You can learn how to further accelerate the performance of your LLMs using KleidiAI.
3+
4+
recommended_path: /learning-paths/cross-platform/kleidiai-explainer/
5+
6+
further_reading:
7+
- resource:
8+
title: ONNX Runtime
9+
link: https://onnxruntime.ai/docs/
10+
type: documentation
11+
- resource:
12+
title: ONNX Runtime generate() API
13+
link: https://onnxruntime.ai/docs/genai/
14+
type: documentation
15+
- resource:
16+
title: Accelerating AI Developer Innovation Everywhere with New Arm Kleidi
17+
link: https://newsroom.arm.com/blog/arm-kleidi
18+
type: blog
19+
20+
21+
# ================================================================================
22+
# FIXED, DO NOT MODIFY
23+
# ================================================================================
24+
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
25+
title: "Next Steps" # Always the same
26+
layout: "learningpathall" # All files under learning paths have this same wrapper
27+
---

0 commit comments

Comments
 (0)