Merge pull request #1872 from pareenaverma/uvision-review

pareenaverma · web-flow · commit 92407a4357ee · 2025-04-24T21:02:31.000-04:00
Tech review of ONNXRuntime Phi-3 on WoA LP
diff --git a/assets/contributors.csv b/assets/contributors.csv
@@ -84,3 +84,5 @@ Shuheng Deng,Arm,,,,
 Yiyang Fan,Arm,,,,
 Julien Jayat,Arm,,,,
 Geremy Cohen,Arm,geremyCohen,geremyinanutshell,,
+Barbara Corriero,Arm,,,,
+
diff --git a/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/1-dev-env-setup.md b/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/1-dev-env-setup.md
@@ -1,5 +1,5 @@
 ---
-title: Create a development environment
+title: Development environment
 weight: 2
 
 ### FIXED, DO NOT MODIFY
@@ -8,15 +8,15 @@ layout: learningpathall
 
 ## Set up your development environment
 
-In this learning path, you will learn how to build and deploy a simple LLM-based tutorial on a Windows-on-ARM (WoA) laptop using ONNX Runtime for inference. 
+In this learning path, you will learn how to build and deploy a LLM on a Windows on Arm (WoA) laptop using ONNX Runtime for inference. 
 
-You will first learn how to build the ONNX Runtime and ONNX Runtime Generate() API library and then how to download the Phi-3 model and run the tutorial. This tutorial runs the short context (4k) mini (3.3B) variant of Phi 3 model. The short context version accepts a shorter (4K) prompts and produces shorter output text compared to the long (128K) context version. The short version will consume less memory.
+You will first learn how to build the ONNX Runtime and ONNX Runtime Generate() API library and then how to download the Phi-3 model and run the inference. You will run the short context (4k) mini (3.3B) variant of Phi 3 model. The short context version accepts a shorter (4K) prompts and produces shorter output text compared to the long (128K) context version. The short version will consume less memory.
 
 Your first task is to prepare a development environment with the required software:
 
 - Visual Studio 2022 IDE (latest version recommended)
-- Python 3.10+ (tested with version 3.11.9)
-- CMake 3.28 or higher (tested with version 3.30.5)
+- Python 3.10 or higher
+- CMake 3.28 or higher
 
 The following instructions were tested on an WoA 64-bit Windows machine with at least 16GB of RAM.
 
@@ -34,17 +34,17 @@ Follow these steps to install and configure Visual Studio 2022 IDE:
 
 5. Once "Downloaded" and "Installed" complete select your workloads. As a minimum you should select **Desktop Development with C++**. This will install the **Microsoft Visual Studio Compiler** or **MSVC**.
 
-## Install Python 3.10+ (Tested with version 3.11.9)
+## Install Python
 
-Download and install [Python 3.110+](https://www.python.org/downloads/)
+Download and install [Python for Windows on Arm](/install-guides/py-woa)
 
-Tested version [Python 3.11.9](https://www.python.org/downloads/release/python-3119/)
+You will need Python version 3.10 or higher. This learning path was tested with version 3.11.9.
 
 ## Install CMake
 
 CMake is an open-source tool that automates the build process for software projects, helping to generate platform-specific build configurations.
 
-[Download and install CMake](https://cmake.org/download/)
+[Download and install CMake](/install-guides/cmake)
 
 {{% notice Note %}}
 The instructions were tested with version 3.30.5
diff --git a/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/2-build-onnxruntime.md b/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/2-build-onnxruntime.md
@@ -6,7 +6,7 @@ weight: 3
 layout: learningpathall
 ---
 
-## Compile ONNX Runtime for Windows ARM64 CPU
+## Compile ONNX Runtime for Windows on Arm
 Now that you have your environment set up correctly, you can build the ONNX Runtime inference engine. 
 
 ONNX Runtime is an open-source inference engine designed to accelerate the deployment of machine learning models, particularly those in the Open Neural Network Exchange (ONNX) format. ONNX Runtime is optimized for high performance and low latency, making it popular for production deployment of AI models. You can learn more by reading the [ONNX Runtime Overview](https://onnxruntime.ai/).
@@ -28,26 +28,25 @@ git checkout 4eeefd7260b7fa42a71dd1a08b423d5e7c722050
 You might be able to use a later commit. These steps have been tested with the commit `4eeefd7260b7fa42a71dd1a08b423d5e7c722050`.
 {{% /notice %}}
 
-### Build for Windows CPU
+### Build for Windows
 
-You can build "Release" for a build type that aims to provide an
-a build optimized for performance but without debug information.
+You can build the "Release" configuration for a build optimized for performance but without debug information.
 
 
 ```bash
 .\build.bat --config Release --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync  --skip_tests
 ```
 
 
-As an alternative, you can build  "RelWithDebInfo" for a build type that aims to provide a release-optimized build with debug information.
+As an alternative, you can build with "RelWithDebInfo" configuration for a release-optimized build with debug information.
 
 ```bash
 .\build.bat --config RelWithDebInfo  --build_shared_lib --parallel --compile_no_warning_as_error --skip_submodule_sync  --skip_tests
 ```
 
 
 ### Resulting Dynamic Link Library
-When the build is complete, onnxruntime.dll dynamic linked library can be found in: 
+When the build is complete, the `onnxruntime.dll` dynamic linked library can be found in: 
 
 ```
 dir .\build\Windows\Release\Release\onnxruntime.dll
diff --git a/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/3-build-onnxruntime-generate-api.md b/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/3-build-onnxruntime-generate-api.md
@@ -6,17 +6,17 @@ weight: 4
 layout: learningpathall
 ---
 
-## Compile the ONNX Runtime Generate() API for Windows ARM64 CPU
+## Compile the ONNX Runtime Generate() API for Windows on Arm
 
 The Generate() API in ONNX Runtime is designed for text generation tasks using models like Phi-3. It implements the generative AI loop for ONNX models, including:
 - pre- and post-processing
 - inference with ONNX Runtime- logits processing
 - search and sampling
-- KV cache management.
+- KV cache management
 
 You can learn more by reading the [ONNX Runtime Generate() API page](https://onnxruntime.ai/docs/genai/).
 
-In this page you will learn how to build the Generate API() from source (C/C++ build).
+In this section you will learn how to build the Generate API() from source.
 
 
 ### Clone onnxruntime-genai Repo
@@ -34,14 +34,16 @@ git checkout b2e8176c99473afb726d364454dc827d2181cbb2
 You might be able to use later commits. These steps have been tested with the commit `b2e8176c99473afb726d364454dc827d2181cbb2`.
 {{% /notice %}}
 
-### Build for Windows ARM64 CPU
+### Build for Windows on Arm
 The build command below has a ---config argument, which takes the following options:
 - ```Release``` builds release build
 - ```Debug``` builds binaries with debug symbols
 - ```RelWithDebInfo``` builds release binaries with debug info
 
-Below are the instruction to build ```Release```:
+You will build the `Release` variant of the ONNX Runtime Generate() API:
+
 ```bash
+pip install requests
 python build.py --config Release --skip_tests
 ```
 
diff --git a/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/4-run-benchmark-on-WoA.md b/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/4-run-benchmark-on-WoA.md
@@ -1,25 +1,25 @@
 ---
-title: Run Phi3 model on an ARM Windows Device
+title: Run Phi3 model on a Windows on Arm machine
 weight: 5
 
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
 
-## Run a Phi-3 model on your ARM Windows Device
+## Run a Phi-3 model on your Windows on Arm machine
 
-In this section you will learn how to obtain and run on your ARM Windows device (or virtual device) the Phi3-mini model. To do so you will be using a simple model runner program which provides performance metrics.
+In this section, you will learn how to download the Phi3-mini model and run it on your Windows on Arm machine (physical or virtual machine). You will be use a simple model runner program which provides performance metrics.
 
-The Phi-3-mini (3.3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produces longer output text, but it does consume more memory.
+The Phi-3-mini (3.3B) model has a short (4k) context version and a long (128k) context version. The long context version can accept much longer prompts and produces longer output text, but it consumes more memory.
 In this learning path, you will use the short context version, which is quantized to 4-bits.
 
 The Phi-3-mini model used here is in an ONNX format.
 
 ### Setup
 
-Phi-3 ONNX models are hosted on HuggingFace.
+[Phi-3 ONNX models](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx) are hosted on HuggingFace.
 Hugging Face uses Git for version control and to download ONNX model files, which can be quite large.
-You will first need to get and install the Git Large File Storage (LFS) extension.
+You will first need to install the Git Large File Storage (LFS) extension.
 
 ``` bash
 winget install -e --id GitHub.GitLFS
@@ -34,7 +34,7 @@ You then need to install the ``HuggingFace CLI``.
 pip install huggingface-hub[cli]
 ```
 
-### Download the Phi-3-mini (4k) model for CPU and Mobile
+### Download the Phi-3-mini (4k) model
 
 ``` bash
 cd C:\Users\%USERNAME%
@@ -56,7 +56,7 @@ copy src\ort_genai.h examples\c\include\
 copy src\ort_genai_c.h examples\c\include\
 ```
 
-you can now build the model runner executable in the ''onnxruntime-genai'' folder using the commands below:
+You can now build the model runner executable in the ''onnxruntime-genai'' folder using the commands below:
 
 ``` bash
 cd examples/c
@@ -65,9 +65,9 @@ cd build
 cmake --build . --config Release
 ```
 
-After a successful build, a binary program called `phi3` will be created in the ''onnxruntime-genai'' folder.
+After a successful build, a binary program called `phi3` will be created in the ''onnxruntime-genai'' folder:
 ```output
-dir examples\c\build\Release\phi3.exe
+dir Release\phi3.exe
 ```
 
 #### Run the model
@@ -80,10 +80,22 @@ cd repos\lp
 .\onnxruntime-genai\examples\c\build\Release\phi3.exe .\cpu_and_mobile\cpu-int4-rtn-block-32-acc-level-4\ cpu
 ```
 
-This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model. After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
+This will allow the runner program to load the model. It will then prompt you to input the text prompt to the model as shown:
+
+```output
+-------------
+Hello, Phi-3!
+-------------
+C++ API
+Creating config...
+Creating model...
+Creating tokenizer...
+Prompt: (Use quit() to exit) Or (To terminate current output generation, press Ctrl+C)
+``` 
+
+After you enter your input prompt, the text output by the model will be displayed. On completion, performance metrics similar to those shown below should be displayed:
 
 ```
 Prompt length: 64, New tokens: 931, Time to first: 1.79s, Prompt tokens per second: 35.74 tps, New tokens per second: 6.34 tps
 ```
-
-You have successfully run the Phi-3 model on your Windows device powered by ARM.
+You have successfully run the Phi-3 model on your Windows device powered by Arm.
diff --git a/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/_index.md b/content/learning-paths/laptops-and-desktops/win_on_arm_build_onnxruntime/_index.md
@@ -1,17 +1,17 @@
 ---
-title: Powering Phi-3 on Arm PC with ONNX Runtime on Windows
+title: Run Phi-3 on a Windows on Arm machine with ONNX Runtime
 
 draft: true
 cascade:
     draft: true
     
 minutes_to_complete: 60
 
-who_is_this_for: A deep-dive for advanced developers looking to build ONNX Runtime on Windows ARM (WoA) and leverage the Generate() API to run Phi-3 inference with KleidiAI acceleration.
+who_is_this_for: A deep-dive for advanced developers looking to build ONNX Runtime on Windows on Arm (WoA) and leverage the Generate() API to run Phi-3 inference with KleidiAI acceleration.
 
 learning_objectives: 
-    - Build ONNX Runtime and ONNX Runtime Generate() API for Windows on ARM.
-    - Run a Phi-3 model using ONNX Runtime on an Arm-based Windows laptop.
+    - Build ONNX Runtime and ONNX Runtime Generate() API for Windows on Arm.
+    - Run a Phi-3 model using ONNX Runtime on a Windows on Arm laptop.
 
 prerequisites:
     - A Windows on Arm computer such as the Lenovo Thinkpad X13 running Windows 11 or a Windows on Arm [virtual machine](https://learn.arm.com/learning-paths/cross-platform/woa_azure/)
@@ -25,11 +25,11 @@ armips:
     - Cortex-A
     - Cortex-X
 tools_software_languages:
-    - Visual Studio IDE - 2022+ Community Version
+    - Visual Studio
     - C++
-    - Python 3.10+
+    - Python
     - Git
-    - CMake-3.28 or higher
+    - cmake
 operatingsystems:
     - Windows