Skip to content

Commit 3aac54a

Browse files
Merge pull request #1932 from madeline-underwood/onnx
ONNX_JA to review
2 parents 53b8207 + c7e8d84 commit 3aac54a

File tree

4 files changed

+51
-34
lines changed

4 files changed

+51
-34
lines changed

content/learning-paths/servers-and-cloud-computing/onnx/_index.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,25 @@
11
---
2-
title: Run Phi-3.5 Vision Model with ONNX Runtime on Microsoft Azure Cobalt 100 VMs
2+
title: Deploy Phi-3.5 Vision with ONNX Runtime on Azure Cobalt 100 on Arm
3+
4+
35

46
draft: true
57
cascade:
68
draft: true
79

810
minutes_to_complete: 30
911

10-
who_is_this_for: This is an advanced topic for software developers, ML engineers, and cloud practitioners looking to deploy Microsoft's Phi Models on Arm-based servers using ONNX Runtime.
12+
who_is_this_for: This is an advanced topic for developers, ML engineers, and cloud practitioners looking to deploy Microsoft's Phi Models on Arm-based servers using ONNX Runtime.
1113

1214
learning_objectives:
13-
- Install ONNX Runtime, download and quantize the Phi-3.5 vision model.
14-
- Run the Phi-3.5 model with ONNX Runtime on Azure.
15+
- Quantize and run the Phi-3.5 vision model with ONNX Runtime on Azure.
1516
- Analyze performance on Arm Neoverse-N2 based Azure Cobalt 100 VMs.
1617

1718
prerequisites:
18-
- An [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider. This Learning Path has been tested on a Microsoft Azure Cobalt 100 virtual machine with 32 cores, 8GB of RAM, and 32GB of disk space.
19+
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from an appropriate cloud service provider. This Learning Path has been tested on a Microsoft Azure Cobalt 100 virtual machine with 32 cores, 8GB of RAM, and 32GB of disk space.
1920
- Basic understanding of Python and machine learning concepts.
2021
- Familiarity with ONNX Runtime and Azure cloud services.
21-
- Knowledge of LLM (Large Language Model) fundamentals.
22+
- Knowledge of Large Language Model (LLM) fundamentals.
2223

2324

2425
author: Nobel Chowdary Mandepudi
@@ -34,7 +35,7 @@ operatingsystems:
3435
tools_software_languages:
3536
- Python
3637
- ONNX Runtime
37-
- Microsoft Azure
38+
3839

3940
further_reading:
4041
- resource:
Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
2-
title: Run the Phi 3.5 Chatbot
2+
title: Interact with the Phi-3.5 Chatbot
33
weight: 4
44

55
layout: learningpathall
66
---
77

8-
## Input a Prompt
8+
## Try a text-only prompt
99

1010
To begin, skip the image prompt and input the text prompt as shown in the example below:
1111
![output](output.png)
@@ -17,15 +17,17 @@ Next, download a sample image from the internet using the following `wget` comma
1717
wget https://cdn.pixabay.com/photo/2020/06/30/22/34/dog-5357794__340.jpg
1818
```
1919

20-
After downloading the image, input the image prompt along with the image name, and enter the text prompt as demonstrated in the example below:
20+
## Try an image + text prompt
21+
22+
After downloading the image, provide the image file name when prompted, followed by the text prompt, as demonstrated in the example below:
2123
![image_output](image_output.png)
2224

23-
## Observe Performance Metrics
25+
## Observe performance metrics
2426

2527
As shown in the example above, the LLM Chatbot performs inference at a speed of **44 tokens/second**, with the time to first token being approximately **1 second**. This highlights the efficiency and responsiveness of the LLM Chatbot in processing queries and generating outputs.
2628

27-
## Further Interaction and Custom Applications
29+
## Further interaction and custom applications
2830

2931
You can continue interacting with the chatbot by asking follow-up prompts and observing the performance metrics displayed in the terminal.
3032

31-
This setup demonstrates how to build and configure applications using the Phi 3.5 model for text generation with both text and image inputs. It also showcases the optimized performance of running Phi models on Arm CPUs, emphasizing the significant performance gains achieved through this workflow.
33+
This setup shows how to build applications using the Phi-3.5 model for multimodal generation from text and image inputs. It also highlights the performance benefits of running Phi models on Arm CPUs.

content/learning-paths/servers-and-cloud-computing/onnx/chatbot.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,11 @@ weight: 3
55
layout: learningpathall
66
---
77

8-
## Script for ONNX Runtime based LLM Server
9-
Now create a python script `phi3v.py` with the following content. This script runs the Phi3.5 vision model with ONNX Runtime.
10-
```
8+
## Create the chatbot server script
9+
10+
Create a Python script called `phi3v.py` with the following content.
11+
12+
This script launches a chatbot server using the Phi-3.5 vision model and ONNX Runtime.
1113

1214
```python
1315
# Copyright (c) Microsoft Corporation. All rights reserved.
@@ -94,7 +96,7 @@ def run(args: argparse.Namespace):
9496
params.set_inputs(inputs)
9597
params.set_search_options(max_length=7680)
9698
generator = og.Generator(model, params)
97-
#start_time = time.time()
99+
#start_time = time.time() # commented out and redundant
98100
first_token_duration = None
99101
token_count = 0
100102
while not generator.is_done():
@@ -141,13 +143,13 @@ if __name__ == "__main__":
141143
run(args)
142144
```
143145

144-
## Run the Server
146+
## Run the server
145147

146-
You are now ready to run the server to enable chatbot.
148+
You’re now ready to run the chatbot server.
147149

148150
Use the following command in a terminal to start the server:
149151

150-
```python
152+
```bash
151153
python3 phi3v.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
152154
```
153155

content/learning-paths/servers-and-cloud-computing/onnx/setup.md

Lines changed: 26 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,35 @@
11
---
22
# User change
3-
title: "Build ONNX Runtime and setup Phi-3.5 vision model"
3+
title: "Build ONNX Runtime and set up the Phi-3.5 Vision Model"
44

55
weight: 2
66

77
# Do not modify these elements
88
layout: "learningpathall"
99
---
10+
## Overview
1011

11-
In this Learning Path you will learn how to run quantized Phi models using ONNX Runtime on Microsoft Azure Cobalt 100 servers using ONNX Runtime. Specifically, you will deploy the Phi 3.5 vision model on Arm-based servers running Ubuntu 24.04 LTS. The instructions have been tested on an Azure `Dpls_v6` 32 core instance.
12+
In this Learning Path, you'll run quantized Phi models with ONNX Runtime on Microsoft Azure Cobalt 100 servers.
13+
14+
Specifically, you'll deploy the Phi-3.5 vision model on Arm-based servers running Ubuntu 24.04 LTS.
15+
16+
17+
{{% notice Note %}}
18+
These instructions have been tested on a 32-core Azure `Dpls_v6` instance.
19+
{{% /notice %}}
1220

13-
## Overview
1421

1522
You will learn how to build and configure ONNX Runtime to enable efficient LLM inference on Arm CPUs.
1623

17-
The tutorial covers the following steps:
18-
- Building ONNX Runtime, quantizing and converting the Phi 3.5 vision model to the ONNX format.
19-
- Running the model using a Python script with ONNX Runtime to perform LLM inference on the CPU.
20-
- Analyzing the performance.
24+
This Learning Path walks you through the following tasks:
25+
- Build ONNX Runtime.
26+
- Quantize and convert the Phi-3.5 vision model to ONNX format.
27+
- Run the model using a Python script with ONNX Runtime for CPU-based LLM inference.
28+
- Analyze performance on Arm CPUs.
2129

2230
## Install dependencies
2331

24-
Install the following packages on your Arm-based server instance:
32+
On your Arm-based server, install the following packages:
2533

2634
```bash
2735
sudo apt update
@@ -30,18 +38,17 @@ Install the following packages on your Arm-based server instance:
3038

3139
## Create a requirements file
3240

33-
Use a file editor of your choice and create a `requirements.txt` file will the python packages shown below:
41+
Use a file editor of your choice and create a `requirements.txt` file with the Python packages shown below:
3442

3543
```python
3644
requests
3745
torch
3846
transformers
3947
accelerate
4048
huggingface-hub
41-
pyreadline3
4249
```
4350

44-
## Install Python Dependencies
51+
## Install Python dependencies
4552

4653
Create a virtual environment:
4754
```bash
@@ -68,13 +75,18 @@ Clone and build the `onnxruntime-genai` repository, which includes the Kleidi AI
6875
cd build/Linux/Release/wheel/
6976
pip install onnxruntime_genai-0.9.0.dev0-cp312-cp312-linux_aarch64.whl
7077
```
78+
{{% notice Note %}}
79+
Ensure you're using Python 3.12 to match the cp312 wheel format.
80+
{{% /notice %}}
81+
82+
This build includes optimizations from Kleidi AI for efficient inference on Arm CPUs.
7183

72-
## Download and Quantize the Model
84+
## Download and quantize the model
7385

74-
Navigate to the home directory, download the quantized model using `huggingface-cli`:
86+
Navigate to your home directory. Now download the quantized model using `huggingface-cli`:
7587
```bash
7688
cd ~
7789
huggingface-cli download microsoft/Phi-3.5-vision-instruct-onnx --include cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4/* --local-dir .
7890
```
7991

80-
The Phi 3.5 vision model has now been successfully quantized into the ONNX format. The next step is to run the model using ONNX Runtime.
92+
The Phi-3.5 vision model is now downloaded in ONNX format with INT4 quantization and is ready to run with ONNX Runtime.

0 commit comments

Comments
 (0)