Skip to content

Commit 3ada49b

Browse files
authored
Merge pull request #2386 from pareenaverma/main
ONNX on Azure Tech Review
2 parents 4a96c91 + 3cd1125 commit 3ada49b

File tree

6 files changed

+88
-83
lines changed

6 files changed

+88
-83
lines changed

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/_index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,22 +7,22 @@ cascade:
77

88
minutes_to_complete: 60
99

10-
who_is_this_for: This Learning Path introduces ONNX deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers migrating ONNX-based applications from x86_64 to Arm with minimal or no changes.
10+
who_is_this_for: This Learning Path introduces ONNX deployment on Microsoft Azure Cobalt 100 (Arm-based) virtual machines. It is designed for developers deploying ONNX-based applications on Arm-based machines.
1111

1212
learning_objectives:
1313
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu Pro 24.04 LTS as the base image.
1414
- Deploy ONNX on the Ubuntu Pro virtual machine.
15-
- Perform ONNX baseline testing and benchmarking on both x86_64 and Arm64 virtual machines.
15+
- Perform ONNX baseline testing and benchmarking on Arm64 virtual machines.
1616

1717
prerequisites:
1818
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6).
1919
- Basic understanding of Python and machine learning concepts.
2020
- Familiarity with [ONNX Runtime](https://onnxruntime.ai/docs/) and Azure cloud services.
2121

22-
author: Jason Andrews
22+
author: Pareena Verma
2323

2424
### Tags
25-
skilllevels: Advanced
25+
skilllevels: Introductory
2626
subjects: ML
2727
cloud_service_providers: Microsoft Azure
2828

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/background.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ weight: 2
66
layout: "learningpathall"
77
---
88

9-
## Cobalt 100 Arm-based processor
9+
## Azure Cobalt 100 Arm-based processor
1010

1111
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor: the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. These include web and application servers, data analytics, open-source databases, caching systems, and more. Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
1212

@@ -16,6 +16,6 @@ To learn more about Cobalt 100, refer to the blog [Announcing the preview of new
1616
ONNX (Open Neural Network Exchange) is an open-source format designed for representing machine learning models.
1717
It provides interoperability between different deep learning frameworks, enabling models trained in one framework (such as PyTorch or TensorFlow) to be deployed and run in another.
1818

19-
ONNX models are serialized into a standardized format that can be executed by the **ONNX Runtime**, a high-performance inference engine optimized for CPU, GPU, and specialized hardware accelerators. This separation of model training and inference allows developers to build flexible, portable, and production-ready AI workflows.
19+
ONNX models are serialized into a standardized format that can be executed by the ONNX Runtime, a high-performance inference engine optimized for CPU, GPU, and specialized hardware accelerators. This separation of model training and inference allows developers to build flexible, portable, and production-ready AI workflows.
2020

2121
ONNX is widely used in cloud, edge, and mobile environments to deliver efficient and scalable inference for deep learning models. Learn more from the [ONNX official website](https://onnx.ai/) and the [ONNX Runtime documentation](https://onnxruntime.ai/docs/).

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/baseline.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,11 @@ layout: learningpathall
77
---
88

99

10-
## Baseline testing using ONNX Runtime:
10+
## Baseline Testing using ONNX Runtime:
1111

12-
This test measures the inference latency of the ONNX Runtime by timing how long it takes to process a single input using the `squeezenet-int8.onnx model`. It helps evaluate how efficiently the model runs on the target hardware.
13-
14-
Create a **baseline.py** file with the below code for baseline test of ONNX:
12+
The purpose of this test is to measure the inference latency of ONNX Runtime on your Azure Cobalt 100 VM. By timing how long it takes to process a single input through the SqueezeNet INT8 model, you can validate that ONNX Runtime is functioning correctly and get a baseline performance measurement for your target hardware.
1513

14+
Create a file named `baseline.py` with the following code:
1615
```python
1716
import onnxruntime as ort
1817
import numpy as np
@@ -29,12 +28,12 @@ end = time.time()
2928
print("Inference time:", end - start)
3029
```
3130

32-
Run the baseline test:
31+
Run the baseline script to measure inference time:
3332

3433
```console
3534
python3 baseline.py
3635
```
37-
You should see an output similar to:
36+
You should see output similar to:
3837
```output
3938
Inference time: 0.0026061534881591797
4039
```
@@ -45,8 +44,11 @@ input tensor of shape (1, 3, 224, 224):
4544
- 224 x 224: image resolution (common for models like SqueezeNet)
4645
{{% /notice %}}
4746

47+
This indicates the model successfully executed a single forward pass through the SqueezeNet INT8 ONNX model and returned results.
48+
4849
#### Output summary:
4950

50-
- Single inference latency: ~2.60 milliseconds (0.00260 sec)
51-
- This shows the initial (cold-start) inference performance of ONNX Runtime on CPU using an optimized int8 quantized model.
52-
- This demonstrates that the setup is fully working, and ONNX Runtime efficiently executes quantized models on Arm64.
51+
Single inference latency(0.00260 sec): This is the time required for the model to process one input image and produce an output. The first run includes graph loading, memory allocation, and model initialization overhead.
52+
Subsequent inferences are usually faster due to caching and optimized execution.
53+
54+
This demonstrates that the setup is fully working, and ONNX Runtime efficiently executes quantized models on Arm64.

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/benchmarking.md

Lines changed: 36 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -6,59 +6,63 @@ weight: 6
66
layout: learningpathall
77
---
88

9-
Now that you’ve set up and run the ONNX model (e.g., SqueezeNet), you can use it to benchmark inference performance using Python-based timing or tools like **onnxruntime_perf_test**. This helps evaluate the ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances.
10-
11-
You can also compare the inference time between Cobalt 100 (Arm64) and similar D-series x86_64-based virtual machine on Azure.
9+
Now that you have validated ONNX Runtime with Python-based timing (e.g., SqueezeNet baseline test), you can move to using a dedicated benchmarking utility called `onnxruntime_perf_test`. This tool is designed for systematic performance evaluation of ONNX models, allowing you to capture more detailed statistics than simple Python timing.
10+
This helps evaluate the ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances and other x86_64 instances. architectures.
1211

1312
## Run the performance tests using onnxruntime_perf_test
14-
The **onnxruntime_perf_test** is a performance benchmarking tool included in the ONNX Runtime source code. It is used to measure the inference performance of ONNX models under various runtime conditions (like CPU, GPU, or other execution providers).
13+
The `onnxruntime_perf_test` is a performance benchmarking tool included in the ONNX Runtime source code. It is used to measure the inference performance of ONNX models and supports multiple execution providers (like CPU, GPU, or other execution providers). on Arm64 VMs, CPU execution is the focus.
1514

1615
### Install Required Build Tools
16+
Before building or running `onnxruntime_perf_test`, you will need to install a set of development tools and libraries. These packages are required for compiling ONNX Runtime and handling model serialization via Protocol Buffers.
1717

1818
```console
1919
sudo apt update
2020
sudo apt install -y build-essential cmake git unzip pkg-config
2121
sudo apt install -y protobuf-compiler libprotobuf-dev libprotoc-dev git
2222
```
23-
Then verify:
23+
Then verify protobuf installation:
2424
```console
2525
protoc --version
2626
```
27-
You should see an output similar to:
27+
You should see output similar to:
2828

2929
```output
3030
libprotoc 3.21.12
3131
```
3232
### Build ONNX Runtime from Source:
3333

34-
The benchmarking tool, **onnxruntime_perf_test**, isn’t available as a pre-built binary artifact for any platform. So, you have to build it from the source, which is expected to take around 40-50 minutes.
34+
The benchmarking tool `onnxruntime_perf_test`, isn’t available as a pre-built binary for any platform. So, you will have to build it from the source, which is expected to take around 40 minutes.
3535

36-
Clone onnxruntime:
36+
Clone onnxruntime repo:
3737
```console
3838
git clone --recursive https://github.com/microsoft/onnxruntime
3939
cd onnxruntime
4040
```
41-
Now, build the benchmark as below:
41+
Now, build the benchmark tool:
4242

4343
```console
4444
./build.sh --config Release --build_dir build/Linux --build_shared_lib --parallel --build --update --skip_tests
4545
```
46-
This will build the benchmark tool inside ./build/Linux/Release/onnxruntime_perf_test.
46+
You should see the executable at:
47+
```output
48+
./build/Linux/Release/onnxruntime_perf_test
49+
```
4750

4851
### Run the benchmark
49-
Now that the benchmarking tool has been built, you can benchmark the **squeezenet-int8.onnx** model, as below:
52+
Now that you have built the benchmarking tool, you can run inference benchmarks on the SqueezeNet INT8 model:
5053

5154
```console
52-
./build/Linux/Release/onnxruntime_perf_test -e cpu -r 100 -m times -s -Z -I <path-to-squeezenet-int8.onnx>
55+
./build/Linux/Release/onnxruntime_perf_test -e cpu -r 100 -m times -s -Z -I ../squeezenet-int8.onnx
5356
```
54-
- **e cpu**: Use the CPU execution provider (not GPU or any other backend).
55-
- **r 100**: Run 100 inferences.
56-
- **m times**: Use "repeat N times" mode.
57-
- **s**: Show detailed statistics.
58-
- **Z**: Disable intra-op thread spinning (reduces CPU usage when idle between runs).
59-
- **I**: Input the ONNX model path without using input/output test data.
57+
Breakdown of the flags:
58+
-e cpu → Use the CPU execution provider.
59+
-r 100 → Run 100 inference passes for statistical reliability.
60+
-m times → Run in “repeat N times” mode. Useful for latency-focused measurement.
61+
-s → Show detailed per-run statistics (latency distribution).
62+
-Z → Disable intra-op thread spinning. Reduces CPU waste when idle between runs, especially on high-core systems like Cobalt 100.
63+
-I → Input the ONNX model path directly, skipping pre-generated test data.
6064

61-
You should see an output similar to:
65+
You should see output similar to:
6266

6367
```output
6468
Disabling intra-op thread spinning between runs
@@ -84,12 +88,12 @@ P999 Latency: 0.00190312 s
8488
```
8589
### Benchmark Metrics Explained
8690

87-
- **Average Inference Time**: The mean time taken to process a single inference request across all runs. Lower values indicate faster model execution.
88-
- **Throughput**: The number of inference requests processed per second. Higher throughput reflects the model’s ability to handle larger workloads efficiently.
89-
- **CPU Utilization**: The percentage of CPU resources used during inference. A value close to 100% indicates full CPU usage, which is expected during performance benchmarking.
90-
- **Peak Memory Usage**: The maximum amount of system memory (RAM) consumed during inference. Lower memory usage is beneficial for resource-constrained environments.
91-
- **P50 Latency (Median Latency)**: The time below which 50% of inference requests complete. Represents typical latency under normal load.
92-
- **Latency Consistency**: Describes the stability of latency values across all runs. "Consistent" indicates predictable inference performance with minimal jitter.
91+
* Average Inference Time: The mean time taken to process a single inference request across all runs. Lower values indicate faster model execution.
92+
* Throughput: The number of inference requests processed per second. Higher throughput reflects the model’s ability to handle larger workloads efficiently.
93+
* CPU Utilization: The percentage of CPU resources used during inference. A value close to 100% indicates full CPU usage, which is expected during performance benchmarking.
94+
* Peak Memory Usage: The maximum amount of system memory (RAM) consumed during inference. Lower memory usage is beneficial for resource-constrained environments.
95+
* P50 Latency (Median Latency): The time below which 50% of inference requests complete. Represents typical latency under normal load.
96+
* Latency Consistency: Describes the stability of latency values across all runs. "Consistent" indicates predictable inference performance with minimal jitter.
9397

9498
### Benchmark summary on Arm64:
9599
Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine**.
@@ -109,30 +113,12 @@ Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pr
109113
| **Latency Consistency** | Consistent |
110114

111115

112-
### Benchmark summary on x86
113-
Here is a summary of benchmark results collected on x86 **D4s_v6 Ubuntu Pro 24.04 LTS virtual machine**.
114-
115-
| **Metric** | **Value on Virtual Machine** |
116-
|----------------------------|-------------------------------|
117-
| **Average Inference Time** | 1.413 ms |
118-
| **Throughput** | 707.48 inferences/sec |
119-
| **CPU Utilization** | 100% |
120-
| **Peak Memory Usage** | 38.80 MB |
121-
| **P50 Latency** | 1.396 ms |
122-
| **P90 Latency** | 1.501 ms |
123-
| **P95 Latency** | 1.520 ms |
124-
| **P99 Latency** | 1.794 ms |
125-
| **P999 Latency** | 1.794 ms |
126-
| **Max Latency** | 1.794 ms |
127-
| **Latency Consistency** | Consistent |
128-
129-
130-
### Highlights from Ubuntu Pro 24.04 Arm64 Benchmarking
116+
### Highlights from Benchmarking on Azure Cobalt 100 Arm64 VMs
131117

132-
When comparing the results on Arm64 vs x86_64 virtual machines:
133-
- **Low-Latency Inference:** Achieved consistent average inference times of ~1.86 ms on Arm64.
134-
- **Strong and Stable Throughput:** Sustained throughput of over 538 inferences/sec using the `squeezenet-int8.onnx` model on D4ps_v6 instances.
135-
- **Lightweight Resource Footprint:** Peak memory usage stayed below 37 MB, with CPU utilization around 96%, ideal for efficient edge or cloud inference.
136-
- **Consistent Performance:** P50, P95, and Max latency remained tightly bound, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
118+
The results on Arm64 virtual machines demonstrate:
119+
- Low-Latency Inference: Achieved consistent average inference times of ~1.86 ms on Arm64.
120+
- Strong and Stable Throughput: Sustained throughput of over 538 inferences/sec using the `squeezenet-int8.onnx` model on D4ps_v6 instances.
121+
- Lightweight Resource Footprint: Peak memory usage stayed below 37 MB, with CPU utilization around 96%, ideal for efficient edge or cloud inference.
122+
- Consistent Performance: P50, P95, and Max latency remained tightly bound, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
137123

138-
You have now benchmarked ONNX on an Azure Cobalt 100 Arm64 virtual machine and compared results with x86_64.
124+
You have now successfully benchmarked inference time of ONNX models on an Azure Cobalt 100 Arm64 virtual machine.

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/create-instance.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,24 @@
11
---
2-
title: Create an Arm based cloud virtual machine using Microsoft Cobalt 100 CPU
2+
title: Create an Arm-based Azure VM with Cobalt 100
33
weight: 3
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
## Introduction
9+
## Set up your development environment
1010

11-
There are several ways to create an Arm-based Cobalt 100 virtual machine : the Microsoft Azure console, the Azure CLI tool, or using your choice of IaC (Infrastructure as Code). This guide will use the Azure console to create a virtual machine with Arm-based Cobalt 100 Processor.
11+
There is more than one way to create an Arm-based Cobalt 100 virtual machine:
1212

13-
This learning path focuses on the general-purpose virtual machine of the D series. Please read the guide on [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series) offered by Microsoft Azure.
13+
- The Microsoft Azure portal
14+
- The Azure CLI
15+
- Your preferred infrastructure as code (IaC) tool
1416

15-
If you have never used the Microsoft Cloud Platform before, please review the microsoft [guide to Create a Linux virtual machine in the Azure portal](https://learn.microsoft.com/en-us/azure/virtual-machines/linux/quick-create-portal?tabs=ubuntu).
17+
In this Learning Path, you will use the Azure portal to create a virtual machine with the Arm-based Azure Cobalt 100 processor.
18+
19+
You will focus on the general-purpose virtual machines in the D-series. For further information, see the Microsoft Azure guide for the [Dpsv6 size series](https://learn.microsoft.com/en-us/azure/virtual-machines/sizes/general-purpose/dpsv6-series).
20+
21+
While the steps to create this instance are included here for convenience, for further information on setting up Cobalt on Azure, see [Deploy a Cobalt 100 virtual machine on Azure Learning Path](/learning-paths/servers-and-cloud-computing/cobalt/).
1622

1723
#### Create an Arm-based Azure Virtual Machine
1824

0 commit comments

Comments
 (0)