Skip to content

Commit c24b3b8

Browse files
authored
Merge pull request #1583 from annietllnd/torchbench-lp
Update TorchBench LP
2 parents d9adb60 + fce26dd commit c24b3b8

File tree

2 files changed

+33
-15
lines changed

2 files changed

+33
-15
lines changed

content/learning-paths/servers-and-cloud-computing/torchbench/_index.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@ title: Accelerate and measure PyTorch Inference on Arm servers
33

44
minutes_to_complete: 20
55

6-
who_is_this_for: This is an introductory topic for software developers who want to learn how to measure and accelerate the performance of Natural Language Processing (NLP), vision and recommender PyTorch models on Arm-based servers.
6+
who_is_this_for: This is an introductory topic for software developers who want to learn how to measure and accelerate the performance of Natural Language Processing (NLP), vision and recommender PyTorch models on Arm-based servers.
77

88
learning_objectives:
99
- Download and install the PyTorch Benchmarks suite.
1010
- Evaluate the performance of PyTorch model inference running on your Arm based server using the PyTorch Benchmark suite.
11-
- Measure the performance of these models using eager and torch.compile modes in PyTorch.
11+
- Measure the performance of these models using eager and `torch.compile` modes in PyTorch.
1212

1313
prerequisites:
1414
- An [Arm-based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server.
@@ -19,13 +19,13 @@ author_primary: Pareena Verma
1919
skilllevels: Introductory
2020
subjects: ML
2121
armips:
22-
- Neoverse
22+
- Neoverse
2323
operatingsystems:
24-
- Linux
24+
- Linux
2525
tools_software_languages:
2626
- Python
2727
- PyTorch
28-
28+
2929
### FIXED, DO NOT MODIFY
3030
# ================================================================================
3131
weight: 1 # _index.md always has weight of 1 to order correctly

content/learning-paths/servers-and-cloud-computing/torchbench/pytorch-benchmark.md

Lines changed: 28 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -46,19 +46,35 @@ git clone https://github.com/pytorch/benchmark.git
4646
cd benchmark
4747
git checkout 9a5e4137299741e1b6fb7aa7f5a6a853e5dd2295
4848
```
49-
Install the PyTorch models you would like to benchmark. Lets install a variety of NLP, computer vision and recommender models:
49+
Install the PyTorch models you would like to benchmark. Let's install a variety of NLP, computer vision and recommender models:
5050

5151
```bash
52-
python3 install.py alexnet BERT_pytorch dlrm hf_Albert hf_Bart hf_Bert hf_Bert_large hf_BigBird hf_DistilBert hf_GPT2 hf_Longformer hf_Reformer hf_T5 mobilenet_v2 mobilenet_v3_large resnet152 resnet18 resnet50 timm_vision_transformer
52+
python3 install.py alexnet BERT_pytorch dlrm hf_Albert hf_Bart hf_Bert hf_Bert_large hf_BigBird \
53+
hf_DistilBert hf_GPT2 hf_Longformer hf_Reformer hf_T5 mobilenet_v2 mobilenet_v3_large resnet152 \
54+
resnet18 resnet50 timm_vision_transformer
5355
```
5456

57+
{{% notice Note %}}
58+
If you are using Python 3.12, the install script may fail with the following error:
59+
```output
60+
AttributeError: module 'pkgutil' has no attribute 'ImpImporter'.
61+
Did you mean: 'zipimporter'
62+
```
63+
64+
This may be because the `requirements.txt` installs a version of `numpy` which is not compatible with Python 3.12. To fix the issue, change the pinned `numpy` version in `requirements.txt`.
65+
66+
```
67+
numpy~=1.26.4
68+
```
69+
{{% /notice %}}
70+
5571
If you don't provide a model list to `install.py`, the script will download all the models included in the benchmark suite.
5672

5773
Before running the benchmarks, configure your running AWS Graviton3 instance to take advantage of the optimizations available to optimize PyTorch inference performance. This includes settings to:
58-
* Enable bfloat16 GEMM kernel support to accelerate fp32 inference.
59-
* Set LRU cache capacity to an optimal value to avoid redundant primitive creation latency overhead.
60-
* Enable Linux Transparent Huge Page (THP) allocations, reducing the latency for tensor memory allocation.
61-
* Set the number of threads to use to match the number of cores on your system
74+
* Enable bfloat16 GEMM kernel support to accelerate fp32 inference.
75+
* Set LRU cache capacity to an optimal value to avoid redundant primitive creation latency overhead.
76+
* Enable Linux Transparent Huge Page (THP) allocations, reducing the latency for tensor memory allocation.
77+
* Set the number of threads to use to match the number of cores on your system
6278

6379
```bash
6480
export DNNL_DEFAULT_FPMATH_MODE=BF16
@@ -69,7 +85,7 @@ export OMP_NUM_THREADS=16
6985

7086
With the environment set up and models installed, you can now run the benchmarks to measure your model inference performance.
7187

72-
Starting from PyTorch 2.0, there are 2 main execution modes - eager mode and `torch.compile` mode. The default mode of execution in PyTorch is eager mode. In this mode the operations are executed immediately as they are defined. With `torch.compile` the PyTorch code is transformed into graphs which can be executed more efficiently. This mode can offer improved model inferencing performance, especially for models with repetitive computations.
88+
Starting from PyTorch 2.0, there are 2 main execution modes - eager mode and `torch.compile` mode. The default mode of execution in PyTorch is eager mode. In this mode the operations are executed immediately as they are defined. With `torch.compile` the PyTorch code is transformed into graphs which can be executed more efficiently. This mode can offer improved model inferencing performance, especially for models with repetitive computations.
7389

7490
Using the scripts included in the PyTorch Benchmark suite, you will now measure the model inference latencies with both eager and torch.compile modes to compare their performance.
7591

@@ -78,7 +94,8 @@ Using the scripts included in the PyTorch Benchmark suite, you will now measure
7894
Run the following command to collect performance data in eager mode for the suite of models you downloaded:
7995

8096
```bash
81-
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer --test eval --metrics="latencies"
97+
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer \
98+
--test eval --metrics="latencies"
8299
```
83100
The results for all the models run will be stored in the `.userbenchmark/cpu/` directory. The `cpu` user benchmark creates a folder `cpu-YYmmddHHMMSS` for the test, and aggregates all test results into a JSON file `metrics-YYmmddHHMMSS.json`.`YYmmddHHMMSS` is the time you started the test. The metrics file shows the model inference latency, in milliseconds (msec) for each model you downloaded and ran. The results with eager mode should look like:
84101

@@ -113,10 +130,11 @@ The results for all the models run will be stored in the `.userbenchmark/cpu/` d
113130
```
114131
### Measure torch.compile Mode Performance
115132

116-
The `torch.compile` mode in PyTorch uses inductor as its default backend. For execution on the cpu, the inductor backend leverages C++/OpenMP to generate highly optimized kernels for your model. Run the following command to collect performance data in `torch.compile` mode for the suite of models you downloaded.
133+
The `torch.compile` mode in PyTorch uses inductor as its default backend. For execution on the cpu, the inductor backend leverages C++/OpenMP to generate highly optimized kernels for your model. Run the following command to collect performance data in `torch.compile` mode for the suite of models you downloaded.
117134

118135
```bash
119-
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer --test eval --torchdynamo inductor --metrics="latencies"
136+
python3 run_benchmark.py cpu --model alexnet,BERT_pytorch,dlrm,hf_Albert,hf_Bart,hf_Bert,hf_Bert_large,hf_BigBird,hf_DistilBert,hf_GPT2,hf_Longformer,hf_Reformer,hf_T5,mobilenet_v2,mobilenet_v3_large,resnet152,resnet18,resnet50,timm_vision_transformer \
137+
--test eval --torchdynamo inductor --metrics="latencies"
120138
```
121139

122140
The results for all the models run will be stored in the `.userbenchmark/cpu/` directory. The `cpu` user benchmark creates a folder `cpu-YYmmddHHMMSS` for the test, and aggregates all test results into a JSON file `metrics-YYmmddHHMMSS.json`.`YYmmddHHMMSS` is the time you started the test. The metrics file show the model inference latency, in milliseconds (msec) for each model you downloaded and ran. The results with `torch.compile` mode should look like:

0 commit comments

Comments
 (0)