Skip to content

Commit 0613871

Browse files
authored
Merge pull request #1657 from annietllnd/dlrm-grav4
Add DLRM with MLPerf LP
2 parents f23d616 + 740bdb0 commit 0613871

File tree

5 files changed

+311
-0
lines changed

5 files changed

+311
-0
lines changed
Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
---
2+
title: Overview and setup
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Overview
10+
11+
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
12+
13+
### Arm Neoverse
14+
15+
Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
16+
17+
### About the benchmark
18+
19+
The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
20+
21+
The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
22+
23+
## Configure developer environment
24+
25+
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
26+
27+
| CSP | Instance type |
28+
| --------------------- | -------------- |
29+
| Google Cloud Platform | c4a-highmem-72 |
30+
| Amazon Web Services | r8g.16xlarge |
31+
32+
### Verify Python installation
33+
Make sure Python is installed by running the following and making sure a version is printed.
34+
35+
```bash
36+
python3 --version
37+
```
38+
39+
```output
40+
Python 3.12.6
41+
```
42+
43+
## Install Docker
44+
45+
Start by adding the official Docker GPG key to your system’s APT keyrings directory:
46+
47+
```bash
48+
sudo install -m 0755 -d /etc/apt/keyrings
49+
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
50+
sudo chmod a+r /etc/apt/keyrings/docker.asc
51+
```
52+
53+
Run the following command to add the official Docker repository to your system’s APT sources list:
54+
55+
```bash
56+
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
57+
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
58+
```
59+
60+
After adding the repository, refresh the package index to ensure the latest versions are available:
61+
62+
```bash
63+
sudo apt-get update
64+
```
65+
Now, install the necessary dependencies, including Docker and related components:
66+
```bash
67+
sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
68+
```
69+
70+
This ensures that Docker and its dependencies are correctly set up on your system.
71+
72+
{{% notice Note %}}
73+
If you run into permission issues with Docker, try running the following:
74+
75+
```bash
76+
sudo usermod -aG docker $USER
77+
newgrp docker
78+
```
79+
{{% /notice %}}
80+
81+
With your development environment set up, you can move on to download the model.
Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
---
2+
title: Download model weights and data
3+
weight: 3
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Before building the model, you need to obtain the data and model weights. Start by creating directories for the two in your cloud instance.
10+
11+
## Install rclone and
12+
13+
```bash
14+
cd $HOME
15+
mkdir data
16+
mkdir model
17+
```
18+
19+
Install `rclone` using the bash script.
20+
21+
```bash
22+
curl https://rclone.org/install.sh | sudo bash
23+
```
24+
25+
You should see a similar output if the tools installed successfully.
26+
```output
27+
rclone v1.69.1 has successfully installed.
28+
Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.
29+
```
30+
31+
Configure the following credentials for rclone:
32+
33+
```bash
34+
rclone config create mlc-inference s3 provider=Cloudflare \
35+
access_key_id=f65ba5eef400db161ea49967de89f47b \
36+
secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b \
37+
endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
38+
```
39+
40+
Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
41+
42+
```bash
43+
rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data -P
44+
rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
45+
```
46+
47+
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
48+
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
---
2+
title: Run the benchmark
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
In this section, you will run the benchmark and inspect the results.
10+
11+
## Build PyTorch
12+
13+
You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
14+
15+
```bash
16+
cd $HOME
17+
git clone https://github.com/ARM-software/Tool-Solutions.git
18+
cd $HOME/Tool-Solutions/
19+
git checkout f606cb6276be38bbb264b5ea64809c34837959c4
20+
```
21+
22+
The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
23+
24+
```bash
25+
cd ML-Frameworks/pytorch-aarch64/
26+
./build.sh
27+
```
28+
29+
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
30+
31+
## Run the benchmark
32+
33+
A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
34+
35+
Start by cloning it.
36+
37+
```bash
38+
cd $HOME
39+
git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
40+
```
41+
42+
The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
43+
44+
* Initializes and configures MLPerf repositories within the container.
45+
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
46+
* Converts pretrained weights into a usable model format.
47+
* Performs INT8 calibration if needed.
48+
* Executes the offline benchmark test, generating large-scale binary data during runtime.
49+
50+
```bash
51+
cd dlrm-mlperf-lp
52+
./run_dlrm_benchmark.sh int8
53+
```
54+
55+
The script can take an hour or more to run.
56+
57+
{{% notice Note %}}
58+
59+
To run the `fp32` offline test, it's recommended to use the pre-generated binary data files from the int8 tests. You will need a CSP instance with enough RAM. For this purpose, the AWS `r8g.24xlarge` is recommended. After running the `int8` test, save the files in the `model` and `data` directories, and copy them to the instance intended for the `fp32` benchmark.
60+
{{% /notice %}}
61+
62+
## Understanding the results
63+
64+
As a final step, have a look at the results generated in a text file.
65+
66+
The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
67+
68+
```bash
69+
cat $HOME/results/int8/mlperf_log_summary.txt
70+
```
71+
72+
Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
73+
74+
```output
75+
================================================
76+
MLPerf Results Summary
77+
================================================
78+
SUT name : PyFastSUT
79+
Scenario : Offline
80+
Mode : PerformanceOnly
81+
Samples per second: 1434.8
82+
Result is : VALID
83+
Min duration satisfied : Yes
84+
Min queries satisfied : Yes
85+
Early stopping satisfied: Yes
86+
87+
================================================
88+
Additional Stats
89+
================================================
90+
Min latency (ns) : 124022373
91+
Max latency (ns) : 883187615166
92+
Mean latency (ns) : 442524059715
93+
50.00 percentile latency (ns) : 442808926434
94+
90.00 percentile latency (ns) : 794977004363
95+
95.00 percentile latency (ns) : 839019402197
96+
97.00 percentile latency (ns) : 856679847578
97+
99.00 percentile latency (ns) : 874336993877
98+
99.90 percentile latency (ns) : 882255616119
99+
100+
================================================
101+
Test Parameters Used
102+
================================================
103+
samples_per_query : 1267200
104+
target_qps : 1920
105+
target_latency (ns): 0
106+
max_async_queries : 1
107+
min_duration (ms): 600000
108+
max_duration (ms): 0
109+
min_query_count : 1
110+
max_query_count : 0
111+
qsl_rng_seed : 6023615788873153749
112+
sample_index_rng_seed : 15036839855038426416
113+
schedule_rng_seed : 9933818062894767841
114+
accuracy_log_rng_seed : 0
115+
accuracy_log_probability : 0
116+
accuracy_log_sampling_target : 0
117+
print_timestamps : 0
118+
performance_issue_unique : 0
119+
performance_issue_same : 0
120+
performance_issue_same_index : 0
121+
performance_sample_count : 204800
122+
```
123+
124+
On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
title: MLPerf Benchmarking on Arm Neoverse V2
3+
4+
minutes_to_complete: 90
5+
6+
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
7+
8+
learning_objectives:
9+
- build the Deep Learning Recommendation Model (DLRM) using a Docker image
10+
- run a modified performant DLRMv2 benchmark and inspect the results
11+
12+
prerequisites:
13+
- An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
14+
15+
author: Annie Tallund
16+
17+
### Tags
18+
skilllevels: Introductory
19+
subjects: Performance and Architecture
20+
armips:
21+
- Neoverse
22+
tools_software_languages:
23+
- Docker
24+
- MLPerf
25+
operatingsystems:
26+
- Linux
27+
cloud_service_providers: AWS
28+
29+
further_reading:
30+
- resource:
31+
title: PLACEHOLDER MANUAL
32+
link: PLACEHOLDER MANUAL LINK
33+
type: documentation
34+
- resource:
35+
title: PLACEHOLDER BLOG
36+
link: PLACEHOLDER BLOG LINK
37+
type: blog
38+
- resource:
39+
title: PLACEHOLDER GENERAL WEBSITE
40+
link: PLACEHOLDER GENERAL WEBSITE LINK
41+
type: website
42+
43+
44+
45+
### FIXED, DO NOT MODIFY
46+
# ================================================================================
47+
weight: 1 # _index.md always has weight of 1 to order correctly
48+
layout: "learningpathall" # All files under learning paths have this same wrapper
49+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
50+
---
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
---
2+
# ================================================================================
3+
# FIXED, DO NOT MODIFY THIS FILE
4+
# ================================================================================
5+
weight: 21 # Set to always be larger than the content in this path to be at the end of the navigation.
6+
title: "Next Steps" # Always the same, html page title.
7+
layout: "learningpathall" # All files under learning paths have this same wrapper for Hugo processing.
8+
---

0 commit comments

Comments
 (0)