Skip to content

Commit 92407a4

Browse files
authored
Merge pull request #1872 from pareenaverma/uvision-review
Tech review of ONNXRuntime Phi-3 on WoA LP
2 parents 0414b56 + 19cc194 commit 92407a4

File tree

6 files changed

+55
-40
lines changed

6 files changed

+55
-40
lines changed

assets/contributors.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,5 @@ Shuheng Deng,Arm,,,,
8484
Yiyang Fan,Arm,,,,
8585
Julien Jayat,Arm,,,,
8686
Geremy Cohen,Arm,geremyCohen,geremyinanutshell,,
87+
Barbara Corriero,Arm,,,,
88+

content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/1-dev-env-setup.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Create a development environment
2+
title: Development environment
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -8,15 +8,15 @@ layout: learningpathall
88

99
## Set up your development environment
1010

11-
In this learning path, you will learn how to build and deploy a simple LLM-based tutorial on a Windows-on-ARM (WoA) laptop using ONNX Runtime for inference.
11+
In this learning path, you will learn how to build and deploy a LLM on a Windows on Arm (WoA) laptop using ONNX Runtime for inference.
1212

13-
You will first learn how to build the ONNX Runtime and ONNX Runtime Generate() API library and then how to download the Phi-3 model and run the tutorial. This tutorial runs the short context (4k) mini (3.3B) variant of Phi 3 model. The short context version accepts a shorter (4K) prompts and produces shorter output text compared to the long (128K) context version. The short version will consume less memory.
13+
You will first learn how to build the ONNX Runtime and ONNX Runtime Generate() API library and then how to download the Phi-3 model and run the inference. You will run the short context (4k) mini (3.3B) variant of Phi 3 model. The short context version accepts a shorter (4K) prompts and produces shorter output text compared to the long (128K) context version. The short version will consume less memory.
1414

1515
Your first task is to prepare a development environment with the required software:
1616

1717
- Visual Studio 2022 IDE (latest version recommended)
18-
- Python 3.10+ (tested with version 3.11.9)
19-
- CMake 3.28 or higher (tested with version 3.30.5)
18+
- Python 3.10 or higher
19+
- CMake 3.28 or higher
2020

2121
The following instructions were tested on an WoA 64-bit Windows machine with at least 16GB of RAM.
2222

@@ -34,17 +34,17 @@ Follow these steps to install and configure Visual Studio 2022 IDE:
3434

3535
5. Once "Downloaded" and "Installed" complete select your workloads. As a minimum you should select **Desktop Development with C++**. This will install the **Microsoft Visual Studio Compiler** or **MSVC**.
3636

37-
## Install Python 3.10+ (Tested with version 3.11.9)
37+
## Install Python
3838

39-
Download and install [Python 3.110+](https://www.python.org/downloads/)
39+
Download and install [Python for Windows on Arm](/install-guides/py-woa)
4040

41-
Tested version [Python 3.11.9](https://www.python.org/downloads/release/python-3119/)
41+
You will need Python version 3.10 or higher. This learning path was tested with version 3.11.9.
4242

4343
## Install CMake
4444

4545
CMake is an open-source tool that automates the build process for software projects, helping to generate platform-specific build configurations.
4646

47-
[Download and install CMake](https://cmake.org/download/)
47+
[Download and install CMake](/install-guides/cmake)
4848

4949
{{% notice Note %}}
5050
The instructions were tested with version 3.30.5

content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/2-build-onnxruntime.md

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 3
66
layout: learningpathall
77
---
88

9-
## Compile ONNX Runtime for Windows ARM64 CPU
9+
## Compile ONNX Runtime for Windows on Arm
1010
Now that you have your environment set up correctly, you can build the ONNX Runtime inference engine.
1111

1212
ONNX Runtime is an open-source inference engine designed to accelerate the deployment of machine learning models, particularly those in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is optimized for high performance and low latency, making it popular for production deployment of AI models. You can learn more by reading the [ONNX Runtime Overview](https://onnxruntime.ai/).
@@ -28,26 +28,25 @@ git checkout 4eeefd7260b7fa42a71dd1a08b423d5e7c722050
2828
You might be able to use a later commit. These steps have been tested with the commit `4eeefd7260b7fa42a71dd1a08b423d5e7c722050`.
2929
{{% /notice %}}
3030

31-
### Build for Windows CPU
31+
### Build for Windows
3232

33-
You can build "Release" for a build type that aims to provide an
34-
a build optimized for performance but without debug information.
33+
You can build the "Release" configuration for a build optimized for performance but without debug information.
3534

3635

3736
```bash
3837
.\build.bat --config Release --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync --skip_tests
3938
```
4039

4140

42-
As an alternative, you can build "RelWithDebInfo" for a build type that aims to provide a release-optimized build with debug information.
41+
As an alternative, you can build with "RelWithDebInfo" configuration for a release-optimized build with debug information.
4342

4443
```bash
4544
.\build.bat --config RelWithDebInfo --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync --skip_tests
4645
```
4746

4847

4948
### Resulting Dynamic Link Library
50-
When the build is complete, onnxruntime.dll dynamic linked library can be found in:
49+
When the build is complete, the `onnxruntime.dll` dynamic linked library can be found in:
5150

5251
```
5352
dir .\build\Windows\Release\Release\onnxruntime.dll

content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/3-build-onnxruntime-generate-api.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@ weight: 4
66
layout: learningpathall
77
---
88

9-
## Compile the ONNX Runtime Generate() API for Windows ARM64 CPU
9+
## Compile the ONNX Runtime Generate() API for Windows on Arm
1010

1111
The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including:
1212
- pre- and post-processing
1313
- inference with ONNX Runtime- logits processing
1414
- search and sampling
15-
- KV cache management.
15+
- KV cache management
1616

1717
You can learn more by reading the [ONNX Runtime Generate() API page](https://onnxruntime.ai/docs/genai/).
1818

19-
In this page you will learn how to build the Generate API() from source (C/C++ build).
19+
In this section you will learn how to build the Generate API() from source.
2020

2121

2222
### Clone onnxruntime-genai Repo
@@ -34,14 +34,16 @@ git checkout b2e8176c99473afb726d364454dc827d2181cbb2
3434
You might be able to use later commits. These steps have been tested with the commit `b2e8176c99473afb726d364454dc827d2181cbb2`.
3535
{{% /notice %}}
3636

37-
### Build for Windows ARM64 CPU
37+
### Build for Windows on Arm
3838
The build command below has a ---config argument, which takes the following options:
3939
- ```Release``` builds release build
4040
- ```Debug``` builds binaries with debug symbols
4141
- ```RelWithDebInfo``` builds release binaries with debug info
4242

43-
Below are the instruction to build ```Release```:
43+
You will build the `Release` variant of the ONNX Runtime Generate() API:
44+
4445
```bash
46+
pip install requests
4547
python build.py --config Release --skip_tests
4648
```
4749

content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/4-run-benchmark-on-WoA.md

Lines changed: 25 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
---
2-
title: Run Phi3 model on an ARM Windows Device
2+
title: Run Phi3 model on a Windows on Arm machine
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Run a Phi-3 model on your ARM Windows Device
9+
## Run a Phi-3 model on your Windows on Arm machine
1010

11-
In this section you will learn how to obtain and run on your ARM Windows device (or virtual device) the Phi3-mini model. To do so you will be using a simple model runner program which provides performance metrics.
11+
In this section, you will learn how to download the Phi3-mini model and run it on your Windows on Arm machine (physical or virtual machine). You will be use a simple model runner program which provides performance metrics.
1212

13-
The Phi-3-mini (3.3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produces longer output text, but it does consume more memory.
13+
The Phi-3-mini (3.3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produces longer output text, but it consumes more memory.
1414
In this learning path, you will use the short context version, which is quantized to 4-bits.
1515

1616
The Phi-3-mini model used here is in an ONNX format.
1717

1818
### Setup
1919

20-
Phi-3 ONNX models are hosted on HuggingFace.
20+
[Phi-3 ONNX models](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) are hosted on HuggingFace.
2121
Hugging Face uses Git for version control and to download ONNX model files, which can be quite large.
22-
You will first need to get and install the Git Large File Storage (LFS) extension.
22+
You will first need to install the Git Large File Storage (LFS) extension.
2323

2424
``` bash
2525
winget install -e --id GitHub.GitLFS
@@ -34,7 +34,7 @@ You then need to install the ``HuggingFace CLI``.
3434
pip install huggingface-hub[cli]
3535
```
3636

37-
### Download the Phi-3-mini (4k) model for CPU and Mobile
37+
### Download the Phi-3-mini (4k) model
3838

3939
``` bash
4040
cd C:\Users\%USERNAME%
@@ -56,7 +56,7 @@ copy src\ort_genai.h examples\c\include\
5656
copy src\ort_genai_c.h examples\c\include\
5757
```
5858

59-
you can now build the model runner executable in the ''onnxruntime-genai'' folder using the commands below:
59+
You can now build the model runner executable in the ''onnxruntime-genai'' folder using the commands below:
6060

6161
``` bash
6262
cd examples/c
@@ -65,9 +65,9 @@ cd build
6565
cmake --build . --config Release
6666
```
6767

68-
After a successful build, a binary program called `phi3` will be created in the ''onnxruntime-genai'' folder.
68+
After a successful build, a binary program called `phi3` will be created in the ''onnxruntime-genai'' folder:
6969
```output
70-
dir examples\c\build\Release\phi3.exe
70+
dir Release\phi3.exe
7171
```
7272

7373
#### Run the model
@@ -80,10 +80,22 @@ cd repos\lp
8080
.\onnxruntime-genai\examples\c\build\Release\phi3.exe .\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\ cpu
8181
```
8282

83-
This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model. After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
83+
This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model as shown:
84+
85+
```output
86+
-------------
87+
Hello, Phi-3!
88+
-------------
89+
C++ API
90+
Creating config...
91+
Creating model...
92+
Creating tokenizer...
93+
Prompt: (Use quit() to exit) Or (To terminate current output generation, press Ctrl+C)
94+
```
95+
96+
After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
8497

8598
```
8699
Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps
87100
```
88-
89-
You have successfully run the Phi-3 model on your Windows device powered by ARM.
101+
You have successfully run the Phi-3 model on your Windows device powered by Arm.

content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/_index.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
---
2-
title: Powering Phi-3 on Arm PC with ONNX Runtime on Windows
2+
title: Run Phi-3 on a Windows on Arm machine with ONNX Runtime
33

44
draft: true
55
cascade:
66
draft: true
77

88
minutes_to_complete: 60
99

10-
who_is_this_for: A deep-dive for advanced developers looking to build ONNX Runtime on Windows ARM (WoA) and leverage the Generate() API to run Phi-3 inference with KleidiAI acceleration.
10+
who_is_this_for: A deep-dive for advanced developers looking to build ONNX Runtime on Windows on Arm (WoA) and leverage the Generate() API to run Phi-3 inference with KleidiAI acceleration.
1111

1212
learning_objectives:
13-
- Build ONNX Runtime and ONNX Runtime Generate() API for Windows on ARM.
14-
- Run a Phi-3 model using ONNX Runtime on an Arm-based Windows laptop.
13+
- Build ONNX Runtime and ONNX Runtime Generate() API for Windows on Arm.
14+
- Run a Phi-3 model using ONNX Runtime on a Windows on Arm laptop.
1515

1616
prerequisites:
1717
- A Windows on Arm computer such as the Lenovo Thinkpad X13 running Windows 11 or a Windows on Arm [virtual machine](https://learn.arm.com/learning-paths/cross-platform/woa_azure/)
@@ -25,11 +25,11 @@ armips:
2525
- Cortex-A
2626
- Cortex-X
2727
tools_software_languages:
28-
- Visual Studio IDE - 2022+ Community Version
28+
- Visual Studio
2929
- C++
30-
- Python 3.10+
30+
- Python
3131
- Git
32-
- CMake-3.28 or higher
32+
- cmake
3333
operatingsystems:
3434
- Windows
3535

0 commit comments

Comments
 (0)