Skip to content

Commit a09388c

Browse files
authored
Merge pull request #1809 from BarbaraCorrieroArm/feat_woa_onnxruntime
[CSEREQ-1257] Learning Path Request to Showcase performance uplift of Kleidi for WoA ONNX Runtime
2 parents 137278a + 3d1e62c commit a09388c

File tree

6 files changed

+317
-0
lines changed

6 files changed

+317
-0
lines changed
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
title: Create a development environment
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Set up your development environment
10+
11+
In this learning path, you will learn how to build and deploy a simple LLM-based tutorial on a Windows-on-ARM (WoA) laptop using ONNX Runtime for inference.
12+
13+
You will first learn how to build the ONNX Runtime and ONNX Runtime Generate() API library and then how to download the Phi-3 model and run the tutorial. This tutorial runs the short context (4k) mini (3.3B) variant of Phi 3 model. The short context version accepts a shorter (4K) prompts and produces shorter output text compared to the long (128K) context version. The short version will consume less memory.
14+
15+
Your first task is to prepare a development environment with the required software:
16+
17+
- Visual Studio 2022 IDE (latest version recommended)
18+
- Python 3.10+ (tested with version 3.11.9)
19+
- CMake 3.28 or higher (tested with version 3.30.5)
20+
21+
The following instructions were tested on an WoA 64-bit Windows machine with at least 16GB of RAM.
22+
23+
## Install Visual Studio 2022 IDE
24+
25+
Follow these steps to install and configure Visual Studio 2022 IDE:
26+
27+
1. Download and install the latest version of [Visual Studio IDE](https://visualstudio.microsoft.com/downloads/).
28+
29+
2. Select the **Community Version**. An installer called *VisualStudioSetup.exe* will be downloaded.
30+
31+
3. From your Downloads folder, double-click the installer to start the installation.
32+
33+
4. Follow the prompts and acknowledge **License Terms** and **Privacy Statement**.
34+
35+
5. Once "Downloaded" and "Installed" complete select your workloads. As a minimum you should select **Desktop Development with C++**. This will install the **Microsoft Visual Studio Compiler** or **MSVC**.
36+
37+
## Install Python 3.10+ (Tested with version 3.11.9)
38+
39+
Download and install [Python 3.110+](https://www.python.org/downloads/)
40+
41+
Tested version [Python 3.11.9](https://www.python.org/downloads/release/python-3119/)
42+
43+
## Install CMake
44+
45+
CMake is an open-source tool that automates the build process for software projects, helping to generate platform-specific build configurations.
46+
47+
[Download and install CMake](https://cmake.org/download/)
48+
49+
{{% notice Note %}}
50+
The instructions were tested with version 3.30.5
51+
{{% /notice %}}
52+
53+
You now have the required development tools installed to follow this learning path.
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
---
2+
title: Build ONNX Runtime
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Compile ONNX Runtime for Windows ARM64 CPU
10+
Now that you have your environment set up correctly, you can build the ONNX Runtime inference engine.
11+
12+
ONNX Runtime is an open-source inference engine designed to accelerate the deployment of machine learning models, particularly those in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is optimized for high performance and low latency, making it popular for production deployment of AI models. You can learn more by reading the [ONNX Runtime Overview](https://onnxruntime.ai/).
13+
14+
### Clone ONNX Runtime Repo
15+
16+
Open a Developer Command Prompt for Visual Studio to properly setup the environment including path to compiler, linker, utilities and header files. Create your workspace and check out the source tree:
17+
18+
```bash
19+
cd C:\Users\%USERNAME%
20+
mkdir repos\lp
21+
cd repos\lp
22+
git clone --recursive https://github.com/Microsoft/onnxruntime.git
23+
cd onnxruntime
24+
git checkout 4eeefd7260b7fa42a71dd1a08b423d5e7c722050
25+
```
26+
27+
{{% notice Note %}}
28+
You might be able to use a later commit. These steps have been tested with the commit `4eeefd7260b7fa42a71dd1a08b423d5e7c722050`.
29+
{{% /notice %}}
30+
31+
### Build for Windows CPU
32+
33+
You can build "Release" for a build type that aims to provide an
34+
a build optimized for performance but without debug information.
35+
36+
37+
```bash
38+
.\build.bat --config Release --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync --skip_tests
39+
```
40+
41+
42+
As an alternative, you can build "RelWithDebInfo" for a build type that aims to provide a release-optimized build with debug information.
43+
44+
```bash
45+
.\build.bat --config RelWithDebInfo --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync --skip_tests
46+
```
47+
48+
49+
### Resulting Dynamic Link Library
50+
When the build is complete, onnxruntime.dll dynamic linked library can be found in:
51+
52+
```
53+
dir .\build\Windows\Release\Release\onnxruntime.dll
54+
```
55+
56+
or if you build with debug information it can be found in:
57+
58+
```
59+
dir .\build\Windows\RelWithDebInfo\RelWithDebInfo\onnxruntime.dll
60+
```
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
---
2+
title: Build ONNX Runtime Generate() API
3+
weight: 4
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Compile the ONNX Runtime Generate() API for Windows ARM64 CPU
10+
11+
The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including:
12+
- pre- and post-processing
13+
- inference with ONNX Runtime- logits processing
14+
- search and sampling
15+
- KV cache management.
16+
17+
You can learn more by reading the [ONNX Runtime Generate() API page](https://onnxruntime.ai/docs/genai/).
18+
19+
In this page you will learn how to build the Generate API() from source (C/C++ build).
20+
21+
22+
### Clone onnxruntime-genai Repo
23+
Within your Windows Developer Command Prompt for Visual Studio, checkout the source repo:
24+
25+
```bash
26+
cd C:\Users\%USERNAME%
27+
cd repos\lp
28+
git clone https://github.com/microsoft/onnxruntime-genai
29+
cd onnxruntime-genai
30+
git checkout b2e8176c99473afb726d364454dc827d2181cbb2
31+
```
32+
33+
{{% notice Note %}}
34+
You might be able to use later commits. These steps have been tested with the commit `b2e8176c99473afb726d364454dc827d2181cbb2`.
35+
{{% /notice %}}
36+
37+
### Build for Windows ARM64 CPU
38+
The build command below has a ---config argument, which takes the following options:
39+
- ```Release``` builds release build
40+
- ```Debug``` builds binaries with debug symbols
41+
- ```RelWithDebInfo``` builds release binaries with debug info
42+
43+
Below are the instruction to build ```Release```:
44+
```bash
45+
python build.py --config Release --skip_tests
46+
```
47+
48+
When the build is complete, confirm the ONNX Runtime Generate() API Dynamic Link Library has been created:
49+
50+
```output
51+
dir build\Windows\Release\Release\onnxruntime-genai.dll
52+
```
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
---
2+
title: Run Phi3 model on an ARM Windows Device
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Run a Phi-3 model on your ARM Windows Device
10+
11+
In this section you will learn how to obtain and run on your ARM Windows device (or virtual device) the Phi3-mini model. To do so you will be using a simple model runner program which provides performance metrics.
12+
13+
The Phi-3-mini (3.3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produces longer output text, but it does consume more memory.
14+
In this learning path, you will use the short context version, which is quantized to 4-bits.
15+
16+
The Phi-3-mini model used here is in an ONNX format.
17+
18+
### Setup
19+
20+
Phi-3 ONNX models are hosted on HuggingFace.
21+
Hugging Face uses Git for version control and to download ONNX model files, which can be quite large.
22+
You will first need to get and install the Git Large File Storage (LFS) extension.
23+
24+
``` bash
25+
winget install -e --id GitHub.GitLFS
26+
git lfs install
27+
```
28+
If you don’t have winget, download and run the exe from the [official source](https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage?platform=windows).
29+
If the extension is already installed for you when you run the above ``git`` command it will say ``Git LFS initialized``.
30+
31+
You then need to install the ``HuggingFace CLI``.
32+
33+
``` bash
34+
pip install huggingface-hub[cli]
35+
```
36+
37+
### Download the Phi-3-mini (4k) model for CPU and Mobile
38+
39+
``` bash
40+
cd C:\Users\%USERNAME%
41+
cd repos\lp
42+
huggingface-cli download microsoft/Phi-3-mini-4k-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
43+
```
44+
This command downloads the model into a folder called `cpu_and_mobile`.
45+
46+
### Build model runner (ONNX Runtime GenAI C Example)
47+
In the previous section you built ONNX RUntime Generate() API from source.
48+
The headers and dynamic linked libraries that are built need to be copied over to appropriate folders (``lib`` and ``inclue``).
49+
Building from source is a better practice because the examples usually are updated to run with the latest changes.
50+
51+
``` bash
52+
copy onnxruntime\build\Windows\Release\Release\onnxruntime.* onnxruntime-genai\examples\c\lib
53+
cd onnxruntime-genai
54+
copy build\Windows\Release\Release\onnxruntime-genai.* examples\c\lib
55+
copy src\ort_genai.h examples\c\include\
56+
copy src\ort_genai_c.h examples\c\include\
57+
```
58+
59+
you can now build the model runner executable in the ''onnxruntime-genai'' folder using the commands below:
60+
61+
``` bash
62+
cd examples/c
63+
cmake -A arm64 -S . -B build -DPHI3=ON
64+
cd build
65+
cmake --build . --config Release
66+
```
67+
68+
After a successful build, a binary program called `phi3` will be created in the ''onnxruntime-genai'' folder.
69+
```output
70+
dir examples\c\build\Release\phi3.exe
71+
```
72+
73+
#### Run the model
74+
75+
Use the runner you just built to execute the model with the following commands:
76+
77+
``` bash
78+
cd C:\Users\%USERNAME%
79+
cd repos\lp
80+
.\onnxruntime-genai\examples\c\build\Release\phi3.exe .\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\ cpu
81+
```
82+
83+
This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model. After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
84+
85+
```
86+
Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps
87+
```
88+
89+
You have successfully run the Phi-3 model on your Windows device powered by ARM.
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
title: Powering Phi-3 on Arm PC with ONNX Runtime on Windows
3+
4+
draft: true
5+
cascade:
6+
draft: true
7+
8+
minutes_to_complete: 60
9+
10+
who_is_this_for: A deep-dive for advanced developers looking to build ONNX Runtime on Windows ARM (WoA) and leverage the Generate() API to run Phi-3 inference with KleidiAI acceleration.
11+
12+
learning_objectives:
13+
- Build ONNX Runtime and ONNX Runtime Generate() API for Windows on ARM.
14+
- Run a Phi-3 model using ONNX Runtime on an Arm-based Windows laptop.
15+
16+
prerequisites:
17+
- A Windows on Arm computer such as the Lenovo Thinkpad X13 running Windows 11 or a Windows on Arm [virtual machine](https://learn.arm.com/learning-paths/cross-platform/woa_azure/)
18+
19+
author: Barbara Corriero
20+
21+
### Tags
22+
skilllevels: Advanced
23+
subjects: ML
24+
armips:
25+
- Cortex-A
26+
- Cortex-X
27+
tools_software_languages:
28+
- Visual Studio IDE - 2022+ Community Version
29+
- C++
30+
- Python 3.10+
31+
- Git
32+
- CMake-3.28 or higher
33+
operatingsystems:
34+
- Windows
35+
36+
further_reading:
37+
- resource:
38+
title: ONNX Runtime
39+
link: https://onnxruntime.ai/docs/
40+
type: documentation
41+
- resource:
42+
title: ONNX Runtime generate() API
43+
link: https://onnxruntime.ai/docs/genai/
44+
type: documentation
45+
- resource:
46+
title: Accelerating AI Developer Innovation Everywhere with New Arm Kleidi
47+
link: https://newsroom.arm.com/blog/arm-kleidi
48+
type: blog
49+
50+
### FIXED, DO NOT MODIFY
51+
# ================================================================================
52+
weight: 1 # _index.md always has weight of 1 to order correctly
53+
layout: "learningpathall" # All files under learning paths have this same wrapper
54+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
55+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---

0 commit comments

Comments
 (0)