Skip to content

Commit 2005900

Browse files
authored
Merge pull request #1356 from NickSample/Android-chat-app-ONNX-Runtime
ONXX_Runtime_API_LP_KB to review
2 parents d46fafd + 60a5f60 commit 2005900

File tree

7 files changed

+43
-38
lines changed

7 files changed

+43
-38
lines changed

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/1-dev-env-setup.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,31 +8,31 @@ layout: learningpathall
88

99
## Set up your development environment
1010

11-
In this Learning Path, you will learn how to build and deploy a simple LLM-based chat app to an Android device using ONNX Runtime. You will learn how to build the ONNX runtime and ONNX Runtime generate() API and how to run the Phi-3 model for the Android application.
11+
In this learning path, you will learn how to build and deploy a simple LLM-based chat app to an Android device using ONNX Runtime. You will learn how to build the ONNX Runtime and ONNX Runtime generate() API and how to run the Phi-3 model for the Android application.
1212

13-
The first step is to prepare a development environment with the required software:
13+
Your first task is to prepare a development environment with the required software:
1414

1515
- Android Studio (latest version recommended)
1616
- Android NDK (tested with version 27.0.12077973)
1717
- Python 3.11
1818
- CMake (tested with version 3.28.1)
1919
- Ninja (tested with version 1.11.1)
2020

21-
The instructions were tested on an x86 Windows machine with at least 16GB of RAM.
21+
The following instructions were tested on an x86 Windows machine with at least 16GB of RAM.
2222

2323
## Install Android Studio and Android NDK
2424

2525
Follow these steps to install and configure Android Studio:
2626

2727
1. Download and install the latest version of [Android Studio](https://developer.android.com/studio/).
2828

29-
2. Navigate to `Tools -> SDK Manager`.
29+
2. Navigate to **Tools > SDK Manager**.
3030

31-
3. In the `SDK Platforms` tab, check `Android 14.0 ("UpsideDownCake")`.
31+
3. In the **SDK Platforms** tab, check **Android 14.0 ("UpsideDownCake")**.
3232

33-
4. In the `SDK Tools` tab, check `NDK (Side by side)`.
33+
4. In the **SDK Tools** tab, check **NDK (Side by side)**.
3434

35-
5. Click Ok and Apply.
35+
5. Click **OK** and **Apply**.
3636

3737
## Install Python 3.11
3838

@@ -50,9 +50,7 @@ The instructions were tested with version 3.28.1
5050

5151
## Install Ninja
5252

53-
Ninja is a minimalistic build system designed to efficiently handle incremental builds, particularly in large-scale software projects, by focusing on speed and simplicity.
54-
55-
The Ninja generator needs to be used to build on Windows for Android.
53+
Ninja is a minimalistic build system designed to efficiently handle incremental builds, particularly in large-scale software projects, by focusing on speed and simplicity. The Ninja generator is used to build on Windows for Android.
5654

5755
[Download and install Ninja]( https://github.com/ninja-build/ninja/releases)
5856

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/2-build-onnxruntime.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ You might be able to use a later commit. These steps have been tested with the c
3030

3131
### Build for Android CPU
3232

33-
The Ninja generator needs to be used to build on Windows. First, set JAVA_HOME to the path to your JDK install. You can point to the JDK from Android Studio, or a standalone JDK install.
33+
You use the Ninja generator to build on Windows for Android. First, set JAVA_HOME to the path to your JDK install. You can point to the JDK from Android Studio, or a standalone JDK install.
3434

3535
```bash
3636
$env:JAVA_HOME="C:\Program Files\Android\Android Studio\jbr"
@@ -44,7 +44,7 @@ Now run the following command:
4444

4545
```
4646

47-
Android Archive (AAR) files, which can be imported directly in Android Studio, will be generated by using the above command with `--build_java`
47+
An Android Archive (AAR) file, which can be imported directly in Android Studio, will be generated by using the above command with `--build_java`
4848

4949
When the build is complete, confirm the shared library and the AAR file have been created:
5050

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/3-build-onnxruntime-generate-api.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,15 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
## Cross-compile the ONNX Runtime generate() API for Android CPU
9+
## Cross-compile the ONNX Runtime Generate() API for Android CPU
1010

11-
The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including pre and post processing, inference with ONNX Runtime, logits processing, search and sampling, and KV cache management. You can learn more by reading the [ONNX Runtime generate() API page](https://onnxruntime.ai/docs/genai/).
11+
The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including:
12+
- pre- and post-processing
13+
- inference with ONNX Runtime
14+
- logits processing
15+
- search and sampling
16+
- KV cache management.
17+
You can learn more by reading the [ONNX Runtime generate() API page](https://onnxruntime.ai/docs/genai/).
1218

1319

1420
### Clone onnxruntime-genai repo
@@ -27,7 +33,7 @@ You might be able to use later commits. These steps have been tested with the co
2733

2834
### Build for Android CPU
2935

30-
The Ninja generator needs to be used to build on Windows for Android. Make sure JAVA_HOME is set before running the following command:
36+
Ninja generator is used to build on Windows for Android. Make sure you have set JAVA_HOME before running the following command:
3137

3238
```bash
3339
python -m pip install requests

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/4-run-benchmark-on-android.md

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,18 @@
11
---
2-
title: Run Benchmark on Android phone
2+
title: Run a benchmark on an Android phone
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Run example code for running Phi-3-mini
9+
## Run a Phi-3 model on your Android phone
1010

11+
You can now prepare and run a Phi-3-mini model on your Android smartphone, and view performance metrics:
1112

1213
### Build model runner
1314

14-
You will now cross-compile the model runner to run on Android using the commands below:
15+
First, cross-compile the model runner to run on Android using the commands below:
1516

1617
``` bash
1718
cd onnxruntime-genai
@@ -21,7 +22,7 @@ cd examples\c
2122
mkdir build
2223
cd build
2324
```
24-
Run the cmake command as shown:
25+
Run the `cmake` command as shown:
2526

2627
```bash
2728
cmake -DCMAKE_TOOLCHAIN_FILE=C:\Users\$env:USERNAME\AppData\Local\Android\Sdk\ndk\27.0.12077973\build\cmake\android.toolchain.cmake -DANDROID_ABI=arm64-v8a -DANDROID_PLATFORM=android-27 -DCMAKE_BUILD_TYPE=Release -G "Ninja" ..
@@ -30,25 +31,25 @@ ninja
3031

3132
After successful build, a binary program called `phi3` will be created.
3233

33-
### Prepare phi-3-mini model
34+
### Prepare Phi-3-mini model
3435

35-
Phi-3 ONNX models are hosted on HuggingFace. You can download the Phi-3-mini model with huggingface-cli command:
36+
Phi-3 ONNX models are hosted on HuggingFace. You can download the Phi-3-mini model by using the `huggingface-cli` command:
3637

3738
``` bash
3839
pip install huggingface-hub[cli]
3940
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
4041
```
41-
This command downloads the model into a folder called cpu_and_mobile.
42+
This command downloads the model into a folder called `cpu_and_mobile`.
4243

43-
The phi-3-mini (3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. In this learning path, you will use the short context version, which is quantized to 4-bits.
44+
The Phi-3-mini (3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produce longer output text, but it does consume more memory. In this learning path, you will use the short context version, which is quantized to 4-bits.
4445

4546

4647
### Run on Android via adb shell
4748

48-
#### Connect your android phone
49+
#### Connect your Android phone
4950
Connect your phone to your computer using a USB cable.
5051

51-
You need to enable USB debugging on your Android device. You can follow [Configure on-device developer options](https://developer.android.com/studio/debug/dev-options) to enable USB debugging.
52+
You need to enable USB debugging on your Android device. You can follow [Configure on-device developer options](https://developer.android.com/studio/debug/dev-options) to do this.
5253

5354
Once you have enabled USB debugging and connected via USB, run:
5455

@@ -79,7 +80,7 @@ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/data/local/tmp
7980
./phi3 cpu-int4-rtn-block-32-acc-level-4
8081
```
8182

82-
This will allow the runner program to load the model, and then it will prompt you to input the text prompt to the model. After you enter yout input prompt, the text output by the model will be displayed. On completion, the performance metrics similar to what is shown below should be displayed:
83+
This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model. After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
8384

8485
```
8586
Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/5-build-android-chat-app.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Build and Run Android chat app
2+
title: Build and run an Android chat app
33
weight: 6
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Build Android chat app
9+
## Build an Android chat app
1010

1111
Another way to run the model is to use an Android GUI app.
1212
You can use the Android demo application included in the [onnxruntime-inference-examples repository](https://github.com/microsoft/onnxruntime-inference-examples) to demonstrate local inference.
@@ -27,9 +27,9 @@ You could probably use a later commit but these steps have been tested with the
2727

2828
Open the `mobile\examples\phi-3\android` directory with Android Studio.
2929

30-
#### (Optional) In case you want to use ONNX Runtime AAR you built
30+
#### (Optional) In case you want to use the ONNX Runtime AAR you built
3131

32-
Copy ONNX Runtime AAR you built before if needed:
32+
Copy ONNX Runtime AAR you built earlier in this learning path:
3333

3434
```bash
3535
Copy onnxruntime\build\Windows\Release\java\build\android\outputs\aar\onnxruntime-release.aar mobile\examples\phi-3\android\app\libs
@@ -43,12 +43,12 @@ Update `build.gradle.kts (:app)` as below:
4343
implementation(files("libs/onnxruntime-release.aar"))
4444
```
4545

46-
After that, click `File`->`Sync Project with Gradle`
46+
Finally, click **File > Sync Project with Gradle**
4747

4848
#### Build and run the app
4949

50-
When you press Run, the build will be executed, and then the app will be copied and installed on the Android device. This app will automatically download the Phi-3-mini model during the first run. After the download, you can input the prompt in the text box and execute it to run the model.
50+
When you select **Run**, the build will be executed, and then the app will be copied and installed on the Android device. This app will automatically download the Phi-3-mini model during the first run. After the download, you can input the prompt in the text box and execute it to run the model.
5151

52-
You should now see a running app on your phone that looks like this:
52+
You should now see a running app on your phone, which looks like this:
5353

5454
![App screenshot](screenshot.png)

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,10 +7,10 @@ who_is_this_for: This is an advanced topic for software developers interested in
77

88
learning_objectives:
99
- Build ONNX Runtime and ONNX Runtime generate() API for Android.
10-
- Run the Phi-3 model using ONNX Runtime on an Arm-based smartphone.
10+
- Run a Phi-3 model using ONNX Runtime on an Arm-based smartphone.
1111

1212
prerequisites:
13-
- A Windows x86_64 development machine with at least 16GB of RAM. You should also be able to use Linux or MacOS for the build, but the instructions for it have not been included in this learning path.
13+
- A Windows x86_64 development machine with at least 16GB of RAM.
1414
- An Android phone with at least 8GB of RAM. This learning path was tested on Samsung Galaxy S24.
1515

1616
author_primary: Koki Mitsunami

content/learning-paths/smartphones-and-mobile/build-android-chat-app-using-onnxruntime/_review.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ review:
99
- A cloud-based data storage service for deep learning models.
1010
correct_answer: 1
1111
explanation: >
12-
ONNX Runtime is a cross-platform inference engine designed to to run machine-learning models in the ONNX format. It optimizes model performance across various hardware environments, including CPUs, GPUs, and specialized accelerators.
12+
ONNX Runtime is a cross-platform inference engine designed to to run machine learning models in the ONNX format. It optimizes model performance across various hardware environments, including CPUs, GPUs and specialized accelerators.
1313
1414
- questions:
1515
question: >
@@ -20,7 +20,7 @@ review:
2020
- A toolkit for converting machine learning models to ONNX format.
2121
correct_answer: 2
2222
explanation: >
23-
Phi models are a series of large language models developed to perform natural language processing tasks such as text generation, completion, and comprehension.
23+
Phi models are a series of Large Language Models developed to perform natural language processing tasks such as text generation, completion and comprehension.
2424
2525
- questions:
2626
question: >
@@ -31,7 +31,7 @@ review:
3131
- It allows models to be exchanged between different frameworks, such as PyTorch and TensorFlow.
3232
correct_answer: 3
3333
explanation: >
34-
The ONNX (Open Neural Network Exchange) format is an open-source standard designed to enable the sharing and use of machine learning models across different frameworks such as PyTorch, TensorFlow, and others. It allows models to be exported in a unified format, making them interoperable and ensuring they can run on various platforms or hardware.
34+
The ONNX (Open Neural Network Exchange) format is an open-source standard designed to enable the sharing and use of machine learning models across different frameworks such as PyTorch and TensorFlow. It allows models to be exported in a unified format, making them interoperable and ensuring they can run on various platforms or hardware.
3535
3636
3737

0 commit comments

Comments
 (0)