You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,13 +10,19 @@ layout: learningpathall
10
10
11
11
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
12
12
13
+
### Arm Neoverse
14
+
13
15
Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
14
16
15
-
Running MLPerf benchmarks on Arm’s Neoverse V2 platform assesses how well models like DLRM perform on this architecture.
17
+
### About the benchmark
18
+
19
+
The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
20
+
21
+
The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
16
22
17
23
## Configure developer environment
18
24
19
-
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples in the table below. These instructions have been tested on Ubuntu 22.04.
25
+
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
You will now download the data and model weights. This process takes an hour or more depending on your internet connection.
40
+
Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
Once it finishes, you should see that the `model` and `data` directories are populated.
47
+
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
48
48
49
-
## Build DLRM image
50
-
51
-
You will use a branch of the the `Tool-Solutions` repository. This branch includes releases of PyTorch which enhance the performance of ML frameworks.
The `build.sh` script builds a wheel and a Docker image containing PyTorch and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section.
61
-
62
-
```bash
63
-
cd ML-Frameworks/pytorch-aarch64/
64
-
./build.sh
65
-
```
66
-
67
-
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
In this section, you will run the benchmark and inspect the results.
10
10
11
-
## Download patches
11
+
## Build PyTorch
12
12
13
-
Start by downloading the patches which will be applied during setup.
13
+
You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
You will now create a script that automates the setup, configuration, and execution of MLPerf benchmarking for the DLRM (Deep Learning Recommendation Model) inside a Docker container. It simplifies the process by handling dependency installation, model preparation, and benchmarking in a single run. Create a new file called `run_dlrm_benchmark.sh`. Paste the code below.
22
+
The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
22
23
23
24
```bash
24
-
#!/bin/bash
25
-
26
-
set -ex
27
-
yellow="\e[33m"
28
-
reset="\e[0m"
29
-
30
-
data_type=${1:-"int8"}
31
-
32
-
echo -e "${yellow}Data type chosen for the setup is $data_type${reset}"
33
-
34
-
# Setup directories
35
-
data_dir=$HOME/data/
36
-
model_dir=$HOME/model/
37
-
results_dir=$HOME/results/
38
-
dlrm_container="benchmark_dlrm"
39
-
40
-
mkdir -p $results_dir/$data_type
41
-
42
-
###### Run the dlrm container and setup MLPerf #######
43
-
44
-
echo -e "${yellow}Checking if the container '$dlrm_container' exists...${reset}"
With the script ready, it's time to run the benchmark:
29
+
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
30
+
31
+
## Run the benchmark
32
+
33
+
A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
43
+
44
+
* Initializes and configures MLPerf repositories within the container.
45
+
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
46
+
* Converts pretrained weights into a usable model format.
47
+
* Performs INT8 calibration if needed.
48
+
* Executes the offline benchmark test, generating large-scale binary data during runtime.
129
49
130
50
```bash
131
-
./run_dlrm_benchmark.sh
51
+
cd dlrm-mlperf-lp
52
+
./run_dlrm_benchmark.sh int8
132
53
```
133
54
55
+
The script can take an hour or more to run.
56
+
57
+
{{% notice Note %}}
58
+
59
+
To run the `fp32` offline test, it's recommended to use the pre-generated binary data files from the int8 tests. You will need a CSP instance with enough RAM. For this purpose, the AWS `r8g.24xlarge` is recommended. After running the `int8` test, save the files in the `model` and `data` directories, and copy them to the instance intended for the `fp32` benchmark.
60
+
{{% /notice %}}
61
+
134
62
## Understanding the results
135
63
136
64
As a final step, have a look at the results generated in a text file.
@@ -192,3 +120,5 @@ performance_issue_same : 0
192
120
performance_issue_same_index : 0
193
121
performance_sample_count : 204800
194
122
```
123
+
124
+
On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/_index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
2
title: MLPerf Benchmarking on Arm Neoverse V2
3
3
4
-
minutes_to_complete: 10
4
+
minutes_to_complete: 90
5
5
6
6
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
7
7
@@ -10,7 +10,7 @@ learning_objectives:
10
10
- run a modified performant DLRMv2 benchmark and inspect the results
11
11
12
12
prerequisites:
13
-
- An Arm-based cloud instance with at lest 250GB of RAM and 500 GB of disk space
13
+
- An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
0 commit comments