Merge pull request #1932 from madeline-underwood/onnx

jasonrandrews · web-flow · commit 3aac54a6ecc6 · 2025-05-08T08:23:56.000-05:00
ONNX_JA to review
diff --git a/content/learning-paths/servers-and-cloud-computing/onnx/_index.md b/content/learning-paths/servers-and-cloud-computing/onnx/_index.md
@@ -1,24 +1,25 @@
 ---
-title: Run Phi-3.5 Vision Model with ONNX Runtime on Microsoft Azure Cobalt 100 VMs
+title: Deploy Phi-3.5 Vision with ONNX Runtime on Azure Cobalt 100 on Arm
+
+
 
 draft: true
 cascade:
     draft: true
 
 minutes_to_complete: 30
 
-who_is_this_for: This is an advanced topic for software developers, ML engineers, and cloud practitioners looking to deploy Microsoft's Phi Models on Arm-based servers using ONNX Runtime.
+who_is_this_for: This is an advanced topic for developers, ML engineers, and cloud practitioners looking to deploy Microsoft's Phi Models on Arm-based servers using ONNX Runtime.
 
 learning_objectives:
-    - Install ONNX Runtime, download and quantize the Phi-3.5 vision model.
-    - Run the Phi-3.5 model with ONNX Runtime on Azure.
+    - Quantize and run the Phi-3.5 vision model with ONNX Runtime on Azure.
     - Analyze performance on Arm Neoverse-N2 based Azure Cobalt 100 VMs.
 
 prerequisites:
-    - An [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider. This Learning Path has been tested on a Microsoft Azure Cobalt 100 virtual machine with 32 cores, 8GB of RAM, and 32GB of disk space.
+    - An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider. This Learning Path has been tested on a Microsoft Azure Cobalt 100 virtual machine with 32 cores, 8GB of RAM, and 32GB of disk space.
     - Basic understanding of Python and machine learning concepts.
     - Familiarity with ONNX Runtime and Azure cloud services.
-    - Knowledge of LLM (Large Language Model) fundamentals.
+    - Knowledge of Large Language Model (LLM) fundamentals.
 
 
 author: Nobel Chowdary Mandepudi
@@ -34,7 +35,7 @@ operatingsystems:
 tools_software_languages:
     - Python
     - ONNX Runtime
-    - Microsoft Azure
+   
 
 further_reading:
     - resource:
diff --git a/content/learning-paths/servers-and-cloud-computing/onnx/analysis.md b/content/learning-paths/servers-and-cloud-computing/onnx/analysis.md
@@ -1,11 +1,11 @@
 ---
-title: Run the Phi 3.5 Chatbot 
+title: Interact with the Phi-3.5 Chatbot 
 weight: 4
 
 layout: learningpathall
 ---
 
-## Input a Prompt
+## Try a text-only prompt
 
 To begin, skip the image prompt and input the text prompt as shown in the example below:
 ![output](output.png)
@@ -17,15 +17,17 @@ Next, download a sample image from the internet using the following `wget` comma
 wget https://cdn.pixabay.com/photo/2020/06/30/22/34/dog-5357794__340.jpg
 ```
 
-After downloading the image, input the image prompt along with the image name, and enter the text prompt as demonstrated in the example below:
+## Try an image + text prompt
+
+After downloading the image, provide the image file name when prompted, followed by the text prompt, as demonstrated in the example below:
 ![image_output](image_output.png)
 
-## Observe Performance Metrics
+## Observe performance metrics
 
 As shown in the example above, the LLM Chatbot performs inference at a speed of **44 tokens/second**, with the time to first token being approximately **1 second**. This highlights the efficiency and responsiveness of the LLM Chatbot in processing queries and generating outputs.
 
-## Further Interaction and Custom Applications
+## Further interaction and custom applications
 
 You can continue interacting with the chatbot by asking follow-up prompts and observing the performance metrics displayed in the terminal.
 
-This setup demonstrates how to build and configure applications using the Phi 3.5 model for text generation with both text and image inputs. It also showcases the optimized performance of running Phi models on Arm CPUs, emphasizing the significant performance gains achieved through this workflow.
+This setup shows how to build applications using the Phi-3.5 model for multimodal generation from text and image inputs. It also highlights the performance benefits of running Phi models on Arm CPUs.
diff --git a/content/learning-paths/servers-and-cloud-computing/onnx/chatbot.md b/content/learning-paths/servers-and-cloud-computing/onnx/chatbot.md
@@ -5,9 +5,11 @@ weight: 3
 layout: learningpathall
 ---
 
-## Script for ONNX Runtime based LLM Server
-Now create a python script `phi3v.py` with the following content. This script runs the Phi3.5 vision model with ONNX Runtime.
-```
+## Create the chatbot server script
+
+Create a Python script called `phi3v.py` with the following content. 
+
+This script launches a chatbot server using the Phi-3.5 vision model and ONNX Runtime.
 
 ```python
 # Copyright (c) Microsoft Corporation. All rights reserved.
@@ -94,7 +96,7 @@ def run(args: argparse.Namespace):
         params.set_inputs(inputs)
         params.set_search_options(max_length=7680)
         generator = og.Generator(model, params)
-        #start_time = time.time()
+        #start_time = time.time() # commented out and redundant 
         first_token_duration = None
         token_count = 0
         while not generator.is_done():
@@ -141,13 +143,13 @@ if __name__ == "__main__":
     run(args)
 ```
 
-## Run the Server
+## Run the server
 
-You are now ready to run the server to enable chatbot.
+You’re now ready to run the chatbot server.
 
 Use the following command in a terminal to start the server:
 
-```python
+```bash
 python3 phi3v.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
 ```
 
diff --git a/content/learning-paths/servers-and-cloud-computing/onnx/setup.md b/content/learning-paths/servers-and-cloud-computing/onnx/setup.md
@@ -1,27 +1,35 @@
 ---
 # User change
-title: "Build ONNX Runtime and setup Phi-3.5 vision model"
+title: "Build ONNX Runtime and set up the Phi-3.5 Vision Model"
 
 weight: 2
 
 # Do not modify these elements
 layout: "learningpathall"
 ---
+## Overview
 
-In this Learning Path you will learn how to run quantized Phi models using ONNX Runtime on Microsoft Azure Cobalt 100 servers using ONNX Runtime. Specifically, you will deploy the Phi 3.5 vision model on Arm-based servers running Ubuntu 24.04 LTS. The instructions have been tested on an Azure `Dpls_v6` 32 core instance.
+In this Learning Path, you'll run quantized Phi models with ONNX Runtime on Microsoft Azure Cobalt 100 servers. 
+
+Specifically, you'll deploy the Phi-3.5 vision model on Arm-based servers running Ubuntu 24.04 LTS. 
+
+
+{{% notice Note %}}
+These instructions have been tested on a 32-core Azure `Dpls_v6` instance.
+{{% /notice %}}
 
-## Overview
 
 You will learn how to build and configure ONNX Runtime to enable efficient LLM inference on Arm CPUs.
 
-The tutorial covers the following steps:
-- Building ONNX Runtime, quantizing and converting the Phi 3.5 vision model to the ONNX format.
-- Running the model using a Python script with ONNX Runtime to perform LLM inference on the CPU.
-- Analyzing the performance.
+This Learning Path walks you through the following tasks:
+- Build ONNX Runtime.
+- Quantize and convert the Phi-3.5 vision model to ONNX format.
+- Run the model using a Python script with ONNX Runtime for CPU-based LLM inference.
+- Analyze performance on Arm CPUs.
 
 ## Install dependencies
 
-Install the following packages on your Arm-based server instance:
+On your Arm-based server, install the following packages:
 
 ```bash
     sudo apt update
@@ -30,18 +38,17 @@ Install the following packages on your Arm-based server instance:
 
 ## Create a requirements file
 
-Use a file editor of your choice and create a `requirements.txt` file will the python packages shown below:
+Use a file editor of your choice and create a `requirements.txt` file with the Python packages shown below:
 
 ```python
     requests
     torch
     transformers
     accelerate
     huggingface-hub
-    pyreadline3
 ```
 
-## Install Python Dependencies
+## Install Python dependencies
 
 Create a virtual environment:
 ```bash
@@ -68,13 +75,18 @@ Clone and build the `onnxruntime-genai` repository, which includes the Kleidi AI
     cd build/Linux/Release/wheel/
     pip install onnxruntime_genai-0.9.0.dev0-cp312-cp312-linux_aarch64.whl
 ```
+{{% notice Note %}}
+Ensure you're using Python 3.12 to match the cp312 wheel format.
+{{% /notice %}}
+
+This build includes optimizations from Kleidi AI for efficient inference on Arm CPUs.
 
-## Download and Quantize the Model
+## Download and quantize the model
 
-Navigate to the home directory, download the quantized model using `huggingface-cli`:
+Navigate to your home directory. Now download the quantized model using `huggingface-cli`:
 ```bash
     cd ~
     huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
 ```
 
-The Phi 3.5 vision model has now been successfully quantized into the ONNX format. The next step is to run the model using ONNX Runtime.
+The Phi-3.5 vision model is now downloaded in ONNX format with INT4 quantization and is ready to run with ONNX Runtime.