You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/torchbench/_index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,14 +1,14 @@
1
1
---
2
-
title: Accelerate and measure PyTorch Inference on Arm servers
2
+
title: Measure and accelerate PyTorch Inference on Arm servers
3
3
4
4
minutes_to_complete: 20
5
5
6
6
who_is_this_for: This is an introductory topic for software developers who want to learn how to measure and accelerate the performance of Natural Language Processing (NLP), vision and recommender PyTorch models on Arm-based servers.
7
7
8
8
learning_objectives:
9
9
- Download and install the PyTorch Benchmarks suite.
10
-
- Evaluate the performance of PyTorch model inference running on your Armbased server using the PyTorch Benchmark suite.
11
-
- Measure the performance of these models using eager and `torch.compile` modes in PyTorch.
10
+
- Evaluate PyTorch model inference performance on an Arm-based server using the PyTorch Benchmark suite.
11
+
- Compare the model inference performance using eager mode and `torch.compile` mode in PyTorch.
12
12
13
13
prerequisites:
14
14
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/torchbench/pytorch-benchmark.md
+26-19Lines changed: 26 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,14 +7,16 @@ layout: learningpathall
7
7
---
8
8
9
9
## Before you begin
10
-
The instructions in this Learning Path are for any Arm server running Ubuntu 22.04 LTS. For this example, you need an Arm server instance with at least four cores and 8GB of RAM. The instructions have been tested on AWS Graviton3 (c7g.4xlarge) instances.
10
+
These instructions apply to any Arm server running Ubuntu 22.04 LTS. For this example, you need an Arm server instance with at least four cores and 8GB of RAM. The instructions have been tested on AWS Graviton3 (c7g.4xlarge) instances.
11
11
12
12
## Overview
13
-
PyTorch is a widely-used Machine Learning framework for Python. In this learning path, you will explore how to measure the inference time of PyTorch models running on your Arm-based server using [PyTorch Benchmarks](https://github.com/pytorch/benchmark). PyTorch Benchmarks is a collection of open-source benchmarks designed to evaluate PyTorch performance. Understanding model inference latency is crucial for optimizing machine learning applications, especially in production environments where performance can significantly impact user experience and resource utilization. You will learn how to install the PyTorch benchmark suite and compare inference performance using PyTorch's two modes of execution - eager and torch.compile modes.
13
+
PyTorch is a widely-used Machine Learning framework for Python. In this learning path, you will explore how to measure the inference time of PyTorch models running on your Arm-based server using [PyTorch Benchmarks](https://github.com/pytorch/benchmark), a collection of open-source benchmarks for evaluating PyTorch performance. Understanding inference latency is crucial for optimizing machine learning applications, especially in production environments where performance can significantly impact user experience and resource utilization.
14
14
15
-
To begin, you need to set up your environment by installing the necessary dependencies and PyTorch. Follow these steps:
15
+
You will learn how to install the PyTorch benchmark suite and compare inference performance using PyTorch's two modes of execution; eager Mode and `torch.compile` mode.
16
16
17
-
## Setup Environment
17
+
To begin, set up your environment by installing the required dependencies and PyTorch. Follow these steps:
18
+
19
+
## Set up Environment
18
20
19
21
First, install python and the required system packages:
If you are using Python 3.12, the install script may fail with the following error:
60
+
If you are using Python 3.12, the install script might fail with the following error:
59
61
```output
60
62
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'.
61
63
Did you mean: 'zipimporter'
62
64
```
63
65
64
-
This may be because the `requirements.txt` installs a version of `numpy`which is not compatible with Python 3.12. To fix the issue, change the pinned `numpy` version in `requirements.txt`.
66
+
This issue can occur because `requirements.txt` installs a version of `numpy`that is incompatible with Python 3.12. To resolve it, change the pinned `numpy` version in `requirements.txt`.
65
67
66
68
```
67
69
numpy~=1.26.4
68
70
```
69
71
{{% /notice %}}
70
72
71
-
If you don't provide a model list to `install.py`, the script will download all the models included in the benchmark suite.
73
+
If you don't specify a model list for `install.py`, the script downloads all the models included in the benchmark suite.
74
+
75
+
Before running the benchmarks, configure your AWS Graviton3 instance to leverage available optimizations for improved PyTorch inference performance.
76
+
77
+
This configuration includes settings to:
72
78
73
-
Before running the benchmarks, configure your running AWS Graviton3 instance to take advantage of the optimizations available to optimize PyTorch inference performance. This includes settings to:
74
79
* Enable bfloat16 GEMM kernel support to accelerate fp32 inference.
75
80
* Set LRU cache capacity to an optimal value to avoid redundant primitive creation latency overhead.
76
-
* Enable Linux Transparent Huge Page (THP) allocations, reducing the latency for tensor memory allocation.
77
-
* Set the number of threads to use to match the number of cores on your system
81
+
* Enable Linux Transparent Huge Page (THP) allocations to reduce tensor memory allocation latency.
82
+
* Set the number of threads to use to match the number of cores on your system.
With the environment set up and models installed, you can now run the benchmarks to measure your model inference performance.
91
+
With the environment set up and models installed, you're ready to run the benchmarks to measure your model inference performance.
87
92
88
-
Starting from PyTorch 2.0, there are 2 main execution modes - eager mode and `torch.compile` mode. The default mode of execution in PyTorch is eager mode. In this mode the operations are executed immediately as they are defined. With `torch.compile`the PyTorch code is transformed into graphs which can be executed more efficiently. This mode can offer improved model inferencing performance, especially for models with repetitive computations.
93
+
Starting with PyTorch 2.0, there are two main execution modes: eager mode and `torch.compile` mode. The default mode of execution in PyTorch is eager mode, where operations are executed immediately as they are defined. In contrast, `torch.compile`transforms PyTorch code into graphs which can be executed more efficiently. This mode can improve model inferencing performance, particularly for models with repetitive computations.
89
94
90
-
Using the scripts included in the PyTorch Benchmark suite, you will now measure the model inference latencies with both eager and torch.compile modes to compare their performance.
95
+
Using the scripts included in the PyTorch Benchmark suite, you will now measure the model inference latencies in both eager mode and `torch.compile` mode to compare their performance.
91
96
92
97
### Measure Eager Mode Performance
93
98
94
-
Run the following command to collect performance data in eager mode for the suite of models you downloaded:
99
+
Run the following command to collect performance data in eager mode for the downloaded models:
95
100
96
101
```bash
97
102
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer \
98
103
--test eval --metrics="latencies"
99
104
```
100
-
The results for all the models run will be stored in the `.userbenchmark/cpu/` directory. The `cpu` user benchmark creates a folder `cpu-YYmmddHHMMSS` for the test, and aggregates all test results into a JSON file `metrics-YYmmddHHMMSS.json`.`YYmmddHHMMSS` is the time you started the test. The metrics file shows the model inference latency, in milliseconds (msec) for each model you downloaded and ran. The results with eager mode should look like:
105
+
The benchmark results for all the models run are stored in the `.userbenchmark/cpu/` directory. The `cpu` user benchmark creates a timestamped folder `cpu-YYmmddHHMMSS` for each test, and aggregates all test results into a JSON file `metrics-YYmmddHHMMSS.json`, where `YYmmddHHMMSS` is the time you started the test. The metrics file shows the model inference latency, in milliseconds (msec) for each model you downloaded and ran.
106
+
107
+
The results with eager mode should appear as follows:
101
108
102
109
```output
103
110
{
@@ -130,7 +137,7 @@ The results for all the models run will be stored in the `.userbenchmark/cpu/` d
130
137
```
131
138
### Measure torch.compile Mode Performance
132
139
133
-
The `torch.compile`mode in PyTorch uses inductor as its default backend. For execution on the cpu, the inductor backend leverages C++/OpenMP to generate highly optimized kernels for your model. Run the following command to collect performance data in `torch.compile` mode for the suite of models you downloaded.
140
+
In PyTorch, `torch.compile` uses Inductor as its default backend. For execution on the cpu, the inductor backend leverages C++/OpenMP to generate highly optimized kernels for your model. Run the following command to collect performance data in `torch.compile` mode for the downloaded models.
134
141
135
142
```bash
136
143
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer \
@@ -168,9 +175,9 @@ The results for all the models run will be stored in the `.userbenchmark/cpu/` d
168
175
}
169
176
}
170
177
```
171
-
You will notice that most of these models show a performance improvement in model inference latency when run with the `torch.compile` model using the inductor backend.
178
+
You will notice that most of these models achieve a performance improvement in model inference latency when run with the `torch.compile` model using the inductor backend.
172
179
173
-
You have successfully run the PyTorch Benchmark suite on a variety of different models. You can experiment with the 2 different execution modes and different optimization settings, check the performance and choose the right settings for your model and use case.
180
+
You have successfully run the PyTorch Benchmark suite on a variety of different models. You can experiment with the two different execution modes and different optimization settings, check the performance, and choose the right settings for your model and use case.
0 commit comments