Skip to content

Commit 34ac528

Browse files
Refine documentation for ONNX deployment on Azure Cobalt 100, enhancing clarity and structure across multiple sections.
1 parent 91d9a52 commit 34ac528

File tree

6 files changed

+58
-53
lines changed

6 files changed

+58
-53
lines changed

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,8 @@ minutes_to_complete: 60
77
who_is_this_for: This Learning Path is for developers deploying ONNX-based applications on Arm-based machines.
88

99
learning_objectives:
10-
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu Pro 24.04 LTS as the base image.
11-
- Perform ONNX baseline testing and benchmarking on Arm64 virtual machines.
10+
- Provision an Azure Arm64 virtual machine using Azure console, with Ubuntu Pro 24.04 LTS as the base image
11+
- Perform ONNX baseline testing and benchmarking on Arm64 virtual machines
1212

1313
prerequisites:
1414
- A [Microsoft Azure](https://azure.microsoft.com/) account with access to Cobalt 100 based instances (Dpsv6)

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/background.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,27 +9,29 @@ layout: "learningpathall"
99
## Azure Cobalt 100 Arm-based processor
1010

1111

12-
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor, the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads. You can use Cobalt 100 for:
12+
Azure’s Cobalt 100 is built on Microsoft's first-generation, in-house Arm-based processor, the Cobalt 100. Designed entirely by Microsoft and based on Arm’s Neoverse N2 architecture, this 64-bit CPU delivers improved performance and energy efficiency across a broad spectrum of cloud-native, scale-out Linux workloads.
13+
14+
You can use Cobalt 100 for:
1315

1416
- Web and application servers
1517
- Data analytics
1618
- Open-source databases
1719
- Caching systems
1820
- Many other scale-out workloads
1921

20-
Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance.
21-
22-
You can learn more about Cobalt 100 in the blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
22+
Running at 3.4 GHz, the Cobalt 100 processor allocates a dedicated physical core for each vCPU, ensuring consistent and predictable performance. You can learn more about Cobalt 100 in the Microsoft blog [Announcing the preview of new Azure virtual machine based on the Azure Cobalt 100 processor](https://techcommunity.microsoft.com/blog/azurecompute/announcing-the-preview-of-new-azure-vms-based-on-the-azure-cobalt-100-processor/4146353).
2323

2424
## ONNX
2525

26-
ONNX (Open Neural Network Exchange) is an open-source format designed for representing machine learning models. You can use ONNX to:
26+
ONNX (Open Neural Network Exchange) is an open-source format designed for representing machine learning models.
27+
28+
You can use ONNX to:
2729

2830
- Move models between different deep learning frameworks, such as PyTorch and TensorFlow
2931
- Deploy models trained in one framework to run in another
3032
- Build flexible, portable, and production-ready AI workflows
3133

32-
ONNX models are serialized into a standardized format that you can execute with ONNX Runtimea high-performance inference engine optimized for CPU, GPU, and specialized hardware accelerators. This separation of model training and inference lets you deploy models efficiently across cloud, edge, and mobile environments.
34+
ONNX models are serialized into a standardized format that you can execute with ONNX Runtime - a high-performance inference engine optimized for CPU, GPU, and specialized hardware accelerators. This separation of model training and inference lets you deploy models efficiently across cloud, edge, and mobile environments.
3335

3436
To learn more, see the [ONNX official website](https://onnx.ai/) and the [ONNX Runtime documentation](https://onnxruntime.ai/docs/).
3537

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/baseline.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -37,16 +37,18 @@ You should see output similar to:
3737
```output
3838
Inference time: 0.0026061534881591797
3939
```
40-
{{% notice Note %}}Inference time is the amount of time it takes for a trained machine learning model to make a prediction (i.e., produce output) after receiving input data.
41-
input tensor of shape (1, 3, 224, 224):
42-
- 1: batch size
43-
- 3: color channels (RGB)
40+
{{% notice Note %}}
41+
Inference time is the amount of time it takes for a trained machine learning model to make a prediction (produce output) after receiving input data.
42+
43+
The input tensor has the shape `(1, 3, 224, 224)`, which means:
44+
- 1: batch size (number of images processed at once)
45+
- 3: color channels (RGB)
4446
- 224 x 224: image resolution (common for models like SqueezeNet)
4547
{{% /notice %}}
4648

4749
This indicates the model successfully executed a single forward pass through the SqueezeNet INT8 ONNX model and returned results.
4850

49-
#### Output summary:
51+
## Output summary:
5052

5153
Single inference latency(0.00260 sec): This is the time required for the model to process one input image and produce an output. The first run includes graph loading, memory allocation, and model initialization overhead.
5254
Subsequent inferences are usually faster due to caching and optimized execution.

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/benchmarking.md

Lines changed: 16 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
2-
title: Benchmarking via onnxruntime_perf_test
2+
title: Benchmark ONNX runtime performance with onnxruntime_perf_test
33
weight: 6
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
9+
## Benchmark ONNX model inference on Azure Cobalt 100
1010
Now that you have validated ONNX Runtime with Python-based timing (for example, the SqueezeNet baseline test), you can move to using a dedicated benchmarking utility called `onnxruntime_perf_test`. This tool is designed for systematic performance evaluation of ONNX models, allowing you to capture more detailed statistics than simple Python timing.
1111

1212
This approach helps you evaluate ONNX Runtime efficiency on Azure Arm64-based Cobalt 100 instances and compare results with other architectures if needed.
@@ -18,7 +18,7 @@ You are ready to run benchmarks — a key skill for optimizing real-world deploy
1818
The `onnxruntime_perf_test` tool is included in the ONNX Runtime source code. You can use it to measure the inference performance of ONNX models and compare different execution providers (such as CPU or GPU). On Arm64 VMs, CPU execution is the focus.
1919

2020

21-
### Install required build tools
21+
## Install required build tools
2222
Before building or running `onnxruntime_perf_test`, you need to install a set of development tools and libraries. These packages are required for compiling ONNX Runtime and handling model serialization via Protocol Buffers.
2323

2424
```console
@@ -56,7 +56,7 @@ If the build completes successfully, you should see the executable at:
5656
```
5757

5858

59-
### Run the benchmark
59+
## Run the benchmark
6060
Now that you have built the benchmarking tool, you can run inference benchmarks on the SqueezeNet INT8 model:
6161

6262
```console
@@ -105,17 +105,17 @@ P95 Latency: 0.00187393 s
105105
P99 Latency: 0.00190312 s
106106
P999 Latency: 0.00190312 s
107107
```
108-
### Benchmark Metrics Explained
108+
## Benchmark Metrics Explained
109109

110-
* Average Inference Time: The mean time taken to process a single inference request across all runs. Lower values indicate faster model execution.
111-
* Throughput: The number of inference requests processed per second. Higher throughput reflects the model’s ability to handle larger workloads efficiently.
112-
* CPU Utilization: The percentage of CPU resources used during inference. A value close to 100% indicates full CPU usage, which is expected during performance benchmarking.
113-
* Peak Memory Usage: The maximum amount of system memory (RAM) consumed during inference. Lower memory usage is beneficial for resource-constrained environments.
114-
* P50 Latency (Median Latency): The time below which 50% of inference requests complete. Represents typical latency under normal load.
115-
* Latency Consistency: Describes the stability of latency values across all runs. "Consistent" indicates predictable inference performance with minimal jitter.
110+
* Average inference time: the mean time taken to process a single inference request across all runs. Lower values indicate faster model execution.
111+
* Throughput: the number of inference requests processed per second. Higher throughput reflects the model’s ability to handle larger workloads efficiently.
112+
* CPU utilization: the percentage of CPU resources used during inference. A value close to 100% indicates full CPU usage, which is expected during performance benchmarking.
113+
* Peak Memory Usage: the maximum amount of system memory (RAM) consumed during inference. Lower memory usage is beneficial for resource-constrained environments.
114+
* P50 Latency (Median Latency): the time below which 50% of inference requests complete. Represents typical latency under normal load.
115+
* Latency Consistency: describes the stability of latency values across all runs. "Consistent" indicates predictable inference performance with minimal jitter.
116116

117-
### Benchmark summary on Arm64:
118-
Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine**.
117+
## Benchmark summary on Arm64:
118+
Here is a summary of benchmark results collected on an Arm64 D4ps_v6 Ubuntu Pro 24.04 LTS virtual machine.
119119

120120
| **Metric** | **Value** |
121121
|----------------------------|-------------------------------|
@@ -132,12 +132,9 @@ Here is a summary of benchmark results collected on an Arm64 **D4ps_v6 Ubuntu Pr
132132
| **Latency Consistency** | Consistent |
133133

134134

135-
### Highlights from Benchmarking on Azure Cobalt 100 Arm64 VMs
135+
## Highlights from Benchmarking on Azure Cobalt 100 Arm64 VMs
136+
136137

137-
The results on Arm64 virtual machines demonstrate:
138-
- Low-Latency Inference: Achieved consistent average inference times of ~1.86 ms on Arm64.
139-
- Strong and Stable Throughput: Sustained throughput of over 538 inferences/sec using the `squeezenet-int8.onnx` model on D4ps_v6 instances.
140-
- Lightweight Resource Footprint: Peak memory usage stayed below 37 MB, with CPU utilization around 96%, ideal for efficient edge or cloud inference.
141-
- Consistent Performance: P50, P95, and Max latency remained tightly bound, showcasing reliable performance on Azure Cobalt 100 Arm-based infrastructure.
138+
These results on Arm64 virtual machines demonstrate low-latency inference, with consistent average inference times of approximately 1.86 ms. Throughput remains strong and stable, sustaining over 538 inferences per second using the `squeezenet-int8.onnx` model on D4ps_v6 instances. The resource footprint is lightweight, as peak memory usage stays below 37 MB and CPU utilization is around 96%, making this setup ideal for efficient edge or cloud inference. Performance is also consistent, with P50, P95, and maximum latency values tightly grouped, showcasing reliable results on Azure Cobalt 100 Arm-based infrastructure.
142139

143140
You have now successfully benchmarked inference time of ONNX models on an Azure Cobalt 100 Arm64 virtual machine.

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/create-instance.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,34 +20,36 @@ You will focus on the general-purpose virtual machines in the D-series. For furt
2020

2121
While the steps to create this instance are included here for convenience, for further information on setting up Cobalt on Azure, see [Deploy a Cobalt 100 virtual machine on Azure Learning Path](/learning-paths/servers-and-cloud-computing/cobalt/).
2222

23-
#### Create an Arm-based Azure Virtual Machine
23+
## Create an Arm-based Azure Virtual Machine
2424

25-
Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to "Virtual Machines".
26-
1. Select "Create", and click on "Virtual Machine" from the drop-down list.
27-
2. Inside the "Basic" tab, fill in the Instance details such as "Virtual machine name" and "Region".
28-
3. Choose the image for your virtual machine (for example, Ubuntu Pro 24.04 LTS) and select “Arm64” as the VM architecture.
29-
4. In the “Size” field, click on “See all sizes” and select the D-Series v6 family of virtual machines. Select “D4ps_v6” from the list.
25+
26+
Creating a virtual machine based on Azure Cobalt 100 is no different from creating any other virtual machine in Azure. To create an Azure virtual machine, launch the Azure portal and navigate to **Virtual Machines**.
27+
28+
- Select **Create**, and click on **Virtual Machine** from the drop-down list.
29+
- Inside the **Basic** tab, fill in the Instance details such as **Virtual machine name** and **Region**.
30+
- Choose the image for your virtual machine (for example, Ubuntu Pro 24.04 LTS) and select **Arm64** as the VM architecture.
31+
- In the **Size** field, click on **See all sizes** and select the D-Series v6 family of virtual machines. Select **D4ps_v6** from the list.
3032

3133
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance.png "Select the D-Series v6 family of virtual machines")
3234

33-
5. Select "SSH public key" as an Authentication type. Azure will automatically generate an SSH key pair for you and allow you to store it for future use. It is a fast, simple, and secure way to connect to your virtual machine.
34-
6. Fill in the Administrator username for your VM.
35-
7. Select "Generate new key pair", and select "RSA SSH Format" as the SSH Key Type. RSA could offer better security with keys longer than 3072 bits. Give a Key pair name to your SSH key.
36-
8. In the "Inbound port rules", select HTTP (80) and SSH (22) as the inbound ports.
35+
- Select **SSH public key** as an Authentication type. Azure will automatically generate an SSH key pair for you and allow you to store it for future use. It is a fast, simple, and secure way to connect to your virtual machine.
36+
- Fill in the **Administrator username** for your VM.
37+
- Select **Generate new key pair**, and select **RSA SSH Format** as the SSH Key Type. RSA could offer better security with keys longer than 3072 bits. Give a **Key pair name** to your SSH key.
38+
- In the **Inbound port rules**, select **HTTP (80)** and **SSH (22)** as the inbound ports.
3739

3840
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance1.png "Allow inbound port rules")
3941

40-
9. Click on the "Review + Create" tab and review the configuration for your virtual machine. It should look like the following:
42+
- Click on the **Review + Create** tab and review the configuration for your virtual machine. It should look like the following:
4143

4244
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/ubuntu-pro.png "Review and Create an Azure Cobalt 100 Arm64 VM")
4345

44-
10. Finally, when you are confident about your selection, click on the "Create" button, and click on the "Download Private key and Create Resources" button.
46+
- When you are confident about your selection, click on the **Create** button, and click on the **Download Private key and Create Resources** button.
4547

4648
![Azure portal VM creation — Azure Cobalt 100 Arm64 virtual machine (D4ps_v6) alt-text#center](images/instance4.png "Download Private key and Create Resources")
4749

48-
11. Your virtual machine should be ready and running within a few minutes. You can SSH into the virtual machine using the private key, along with the Public IP details.
50+
- Your virtual machine should be ready and running within a few minutes. You can SSH into the virtual machine using the private key, along with the Public IP details.
4951

50-
You should see your VM listed as "Running" in the Azure portal. If you have trouble connecting, double-check your SSH key and ensure the correct ports are open. If the VM creation fails, check your Azure quota, region availability, or try a different VM size.
52+
You should see your VM listed as **Running** in the Azure portal. If you have trouble connecting, double-check your SSH key and ensure the correct ports are open. If the VM creation fails, check your Azure quota, region availability, or try a different VM size.
5153

5254
Nice work! You have successfully provisioned an Arm-based Azure Cobalt 100 virtual machine. This environment is now ready for ONNX Runtime installation and benchmarking in the next steps.
5355

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/deploy.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,16 @@ layout: learningpathall
88

99

1010
## ONNX Installation on Azure Ubuntu Pro 24.04 LTS
11-
To work with ONNX models on Azure, you will need a clean Python environment with the required packages. The following steps install Python, set up a virtual environment, and prepare for ONNX model execution using ONNX Runtime.
11+
To work with ONNX models on Azure, you will need a clean Python environment with the required packages. The following steps show you how to install Python, set up a virtual environment, and prepare for ONNX model execution using ONNX Runtime.
1212

1313

14-
### Install Python and virtual environment
14+
## Install Python and virtual environment
15+
16+
To get started, update your package list and install Python 3 along with the tools needed to create a virtual environment:
1517

1618
```console
1719
sudo apt update
18-
sudo apt install -y python3 python3-pip python3-virtualenv python3-venv
20+
sudo apt install -y python3 python3-pip python3-venv
1921
```
2022

2123
Create and activate a virtual environment:
@@ -29,7 +31,7 @@ source onnx-env/bin/activate
2931
Once your environment is active, you're ready to install the required libraries.
3032

3133

32-
### Install ONNX and required libraries
34+
## Install ONNX and required libraries
3335

3436
Upgrade pip and install ONNX with its runtime and supporting libraries:
3537
```console
@@ -43,7 +45,7 @@ If you encounter errors during installation, check your internet connection and
4345
After installation, you're ready to validate your setup.
4446

4547

46-
### Validate ONNX and ONNX Runtime
48+
## Validate ONNX and ONNX Runtime
4749
Once the libraries are installed, verify that both ONNX and ONNX Runtime are correctly set up on your VM.
4850

4951
Create a file named `version.py` with the following code:
@@ -68,15 +70,15 @@ If you see version numbers for both ONNX and ONNX Runtime, your environment is r
6870
Great job! You have confirmed that ONNX and ONNX Runtime are installed and ready on your Azure Cobalt 100 VM. This is the foundation for running inference workloads and serving ONNX models.
6971

7072

71-
### Download and validate ONNX model: SqueezeNet
73+
## Download and validate ONNX model: SqueezeNet
7274
SqueezeNet is a lightweight convolutional neural network (CNN) architecture designed to provide accuracy close to AlexNet while using 50x fewer parameters and a much smaller model size. This makes it well-suited for benchmarking ONNX Runtime.
7375

7476
Now that your environment is set up and validated, you're ready to download and test the SqueezeNet model in the next step.
7577
Download the quantized model:
7678
```console
7779
wget https://github.com/onnx/models/raw/main/validated/vision/classification/squeezenet/model/squeezenet1.0-12-int8.onnx -O squeezenet-int8.onnx
7880
```
79-
#### Validate the model:
81+
## Validate the model:
8082

8183
After downloading the SqueezeNet ONNX model, the next step is to confirm that it is structurally valid and compliant with the ONNX specification. ONNX provides a built-in checker utility that verifies the graph, operators, and metadata.
8284
Create a file named `validation.py` with the following code:

0 commit comments

Comments
 (0)