Skip to content

Commit 48dd634

Browse files
authored
Update baseline.md
1 parent 8c24f77 commit 48dd634

File tree

1 file changed

+10
-8
lines changed
  • content/learning-paths/servers-and-cloud-computing/onnx-on-azure

1 file changed

+10
-8
lines changed

content/learning-paths/servers-and-cloud-computing/onnx-on-azure/baseline.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,11 @@ layout: learningpathall
77
---
88

99

10-
## Baseline testing using ONNX Runtime:
10+
## Baseline Testing using ONNX Runtime:
1111

12-
This test measures the inference latency of the ONNX Runtime by timing how long it takes to process a single input using the `squeezenet-int8.onnx model`. It helps evaluate how efficiently the model runs on the target hardware.
13-
14-
Create a **baseline.py** file with the below code for baseline test of ONNX:
12+
The purpose of this test is to measure the inference latency of ONNX Runtime on your Azure Cobalt 100 VM. By timing how long it takes to process a single input through the SqueezeNet INT8 model, you can validate that ONNX Runtime is functioning correctly and get a baseline performance measurement for your target hardware.
1513

14+
Create a file named `baseline.py` with the following code:
1615
```python
1716
import onnxruntime as ort
1817
import numpy as np
@@ -29,12 +28,12 @@ end = time.time()
2928
print("Inference time:", end - start)
3029
```
3130

32-
Run the baseline test:
31+
Run the baseline script to measure inference time:
3332

3433
```console
3534
python3 baseline.py
3635
```
37-
You should see an output similar to:
36+
You should see output similar to:
3837
```output
3938
Inference time: 0.0026061534881591797
4039
```
@@ -45,8 +44,11 @@ input tensor of shape (1, 3, 224, 224):
4544
- 224 x 224: image resolution (common for models like SqueezeNet)
4645
{{% /notice %}}
4746

47+
This indicates the model successfully executed a single forward pass through the SqueezeNet INT8 ONNX model and returned results.
48+
4849
#### Output summary:
4950

50-
- Single inference latency: ~2.60 milliseconds (0.00260 sec)
51-
- This shows the initial (cold-start) inference performance of ONNX Runtime on CPU using an optimized int8 quantized model.
51+
Single inference latency(0.00260 sec): This is the time required for the model to process one input image and produce an output.
52+
Cold-start performance: The first run includes graph loading, memory allocation, and model initialization overhead.
53+
Subsequent inferences are usually faster due to caching and optimized execution paths.model.
5254
- This demonstrates that the setup is fully working, and ONNX Runtime efficiently executes quantized models on Arm64.

0 commit comments

Comments
 (0)