Skip to content

Commit 740bdb0

Browse files
committed
Update DLRM LP
1 parent f1029ae commit 740bdb0

File tree

7 files changed

+53
-1237
lines changed

7 files changed

+53
-1237
lines changed

content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,13 +10,19 @@ layout: learningpathall
1010

1111
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
1212

13+
### Arm Neoverse
14+
1315
Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
1416

15-
Running MLPerf benchmarks on Arm’s Neoverse V2 platform assesses how well models like DLRM perform on this architecture.
17+
### About the benchmark
18+
19+
The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
20+
21+
The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
1622

1723
## Configure developer environment
1824

19-
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples in the table below. These instructions have been tested on Ubuntu 22.04.
25+
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
2026

2127
| CSP | Instance type |
2228
| --------------------- | -------------- |

content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md

Lines changed: 2 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -37,31 +37,12 @@ rclone config create mlc-inference s3 provider=Cloudflare \
3737
endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
3838
```
3939

40-
You will now download the data and model weights. This process takes an hour or more depending on your internet connection.
40+
Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
4141

4242
```bash
4343
rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data -P
4444
rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
4545
```
4646

47-
Once it finishes, you should see that the `model` and `data` directories are populated.
47+
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
4848

49-
## Build DLRM image
50-
51-
You will use a branch of the the `Tool-Solutions` repository. This branch includes releases of PyTorch which enhance the performance of ML frameworks.
52-
53-
```bash
54-
cd $HOME
55-
git clone https://github.com/ARM-software/Tool-Solutions.git
56-
cd $HOME/Tool-Solutions/
57-
git checkout ${1:-"pytorch-aarch64--r24.12"}
58-
```
59-
60-
The `build.sh` script builds a wheel and a Docker image containing PyTorch and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section.
61-
62-
```bash
63-
cd ML-Frameworks/pytorch-aarch64/
64-
./build.sh
65-
```
66-
67-
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.

content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md

Lines changed: 41 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -6,131 +6,59 @@ weight: 5
66
layout: learningpathall
77
---
88

9-
The final step is to run the benchmark.
9+
In this section, you will run the benchmark and inspect the results.
1010

11-
## Download patches
11+
## Build PyTorch
1212

13-
Start by downloading the patches which will be applied during setup.
13+
You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
1414

1515
```bash
16-
wget -r --no-parent https://github.com/ArmDeveloperEcosystem/arm-learning-paths/tree/main/content/learning-paths/servers-and-cloud-computing/dlrm/mlpef_patches $HOME/mlperf_patches
16+
cd $HOME
17+
git clone https://github.com/ARM-software/Tool-Solutions.git
18+
cd $HOME/Tool-Solutions/
19+
git checkout f606cb6276be38bbb264b5ea64809c34837959c4
1720
```
1821

19-
## Benchmark script
20-
21-
You will now create a script that automates the setup, configuration, and execution of MLPerf benchmarking for the DLRM (Deep Learning Recommendation Model) inside a Docker container. It simplifies the process by handling dependency installation, model preparation, and benchmarking in a single run. Create a new file called `run_dlrm_benchmark.sh`. Paste the code below.
22+
The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
2223

2324
```bash
24-
#!/bin/bash
25-
26-
set -ex
27-
yellow="\e[33m"
28-
reset="\e[0m"
29-
30-
data_type=${1:-"int8"}
31-
32-
echo -e "${yellow}Data type chosen for the setup is $data_type${reset}"
33-
34-
# Setup directories
35-
data_dir=$HOME/data/
36-
model_dir=$HOME/model/
37-
results_dir=$HOME/results/
38-
dlrm_container="benchmark_dlrm"
39-
40-
mkdir -p $results_dir/$data_type
41-
42-
###### Run the dlrm container and setup MLPerf #######
43-
44-
echo -e "${yellow}Checking if the container '$dlrm_container' exists...${reset}"
45-
container_exists=$(docker ps -aqf "name=^$dlrm_container$")
46-
47-
if [ -n "$container_exists" ]; then
48-
echo "${yellow}Container '$dlrm_container' already exists.${reset}"
49-
else
50-
echo "Creating a new '$dlrm_container' container..."
51-
docker run -td --shm-size=200G --privileged \
52-
-v $data_dir:$data_dir \
53-
-v $model_dir:$model_dir \
54-
-v $results_dir:$results_dir \
55-
-e DATA_DIR=$data_dir \
56-
-e MODEL_DIR=$model_dir \
57-
-e PATH=/opt/conda/bin:$PATH \
58-
--name=$dlrm_container \
59-
toolsolutions-pytorch:latest
60-
fi
61-
62-
echo -e "${yellow}Setting up MLPerf inside the container...${reset}"
63-
docker cp $HOME/mlperf_patches $dlrm_container:$HOME/
64-
docker exec -it $dlrm_container bash -c "
65-
set -ex
66-
sudo apt update && sudo apt install -y \
67-
software-properties-common lsb-release scons \
68-
build-essential libtool autoconf unzip git vim wget \
69-
numactl cmake gcc-12 g++-12 python3-pip python-is-python3
70-
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12 --slave /usr/bin/g++ g++ /usr/bin/g++-12
71-
72-
if [ ! -d \"/opt/conda\" ]; then
73-
wget -O \"$HOME/miniconda.sh\" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
74-
chmod +x \"$HOME/miniconda.sh\"
75-
sudo bash \"$HOME/miniconda.sh\" -b -p /opt/conda
76-
rm \"$HOME/miniconda.sh\"
77-
fi
78-
export PATH=\"/opt/conda/bin:$PATH\"
79-
/opt/conda/bin/conda install -y python=3.10.12
80-
/opt/conda/bin/conda install -y -c conda-forge cmake gperftools numpy==1.23.0 ninja pyyaml setuptools
81-
82-
git clone --recurse-submodules https://github.com/mlcommons/inference.git inference || (cd inference ; git pull)
83-
cd inference && git submodule update --init --recursive && cd loadgen
84-
CFLAGS=\"-std=c++14\" python setup.py bdist_wheel
85-
pip install dist/*.whl
86-
87-
rm -rf inference_results_v4.0
88-
git clone https://github.com/mlcommons/inference_results_v4.0.git
89-
cd inference_results_v4.0 && git checkout ceef1ea
90-
91-
if [ \"$data_type\" = \"fp32\" ]; then
92-
git apply $HOME/mlperf_patches/arm_fp32.patch
93-
else
94-
git apply $HOME/mlperf_patches/arm_int8.patch
95-
fi
96-
"
97-
98-
echo -e "${yellow}Checking for dumped FP32 model...${reset}"
99-
dumped_fp32_model="dlrm-multihot-pytorch.pt"
100-
int8_model="aarch64_dlrm_int8.pt"
101-
dlrm_test_path="$HOME/inference_results_v4.0/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8"
102-
103-
if [ ! -f "$HOME/model/$dumped_fp32_model" ]; then
104-
echo -e "${yellow}Dumping model weights...${reset}"
105-
docker exec -it "$dlrm_container" bash -c "
106-
pip install -r --extra-index-url https://download.pytorch.org/whl/nightly/cpu tensordict==0.1.2 torchsnapshot==0.1.0 fbgemm_gpu==2025.1.22+cpu torchrec==1.1.0.dev20250127+cpu
107-
"
108-
docker exec -it "$dlrm_container" bash -c "
109-
cd $dlrm_test_path && python python/dump_torch_model.py --model-path=$model_dir/model_weights --dataset-path=$data_dir
110-
"
111-
fi
112-
113-
echo -e "${yellow}Checking if INT8 model calibration is required...${reset}"
114-
if [ "$data_type" == "int8" ] && [ ! -f "$HOME/model/$int8_model" ]; then
115-
echo -e "${yellow}Running INT8 calibration...${reset}"
116-
docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && ./run_calibration.sh"
117-
fi
118-
119-
echo -e "${yellow}Running offline test...${reset}"
120-
docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && bash run_main.sh offline $data_type"
121-
122-
echo -e "${yellow}Copying results to host...${reset}"
123-
docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && cp -r output/pytorch-cpu/dlrm/Offline/performance/run_1/* $results_dir/$data_type/"
124-
125-
cat $results_dir/$data_type/mlperf_log_summary.txt
25+
cd ML-Frameworks/pytorch-aarch64/
26+
./build.sh
12627
```
12728

128-
With the script ready, it's time to run the benchmark:
29+
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
30+
31+
## Run the benchmark
32+
33+
A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
34+
35+
Start by cloning it.
36+
37+
```bash
38+
cd $HOME
39+
git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
40+
```
41+
42+
The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
43+
44+
* Initializes and configures MLPerf repositories within the container.
45+
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
46+
* Converts pretrained weights into a usable model format.
47+
* Performs INT8 calibration if needed.
48+
* Executes the offline benchmark test, generating large-scale binary data during runtime.
12949

13050
```bash
131-
./run_dlrm_benchmark.sh
51+
cd dlrm-mlperf-lp
52+
./run_dlrm_benchmark.sh int8
13253
```
13354

55+
The script can take an hour or more to run.
56+
57+
{{% notice Note %}}
58+
59+
To run the `fp32` offline test, it's recommended to use the pre-generated binary data files from the int8 tests. You will need a CSP instance with enough RAM. For this purpose, the AWS `r8g.24xlarge` is recommended. After running the `int8` test, save the files in the `model` and `data` directories, and copy them to the instance intended for the `fp32` benchmark.
60+
{{% /notice %}}
61+
13462
## Understanding the results
13563

13664
As a final step, have a look at the results generated in a text file.
@@ -192,3 +120,5 @@ performance_issue_same : 0
192120
performance_issue_same_index : 0
193121
performance_sample_count : 204800
194122
```
123+
124+
On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.

content/learning-paths/servers-and-cloud-computing/dlrm/_index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: MLPerf Benchmarking on Arm Neoverse V2
33

4-
minutes_to_complete: 10
4+
minutes_to_complete: 90
55

66
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
77

@@ -10,7 +10,7 @@ learning_objectives:
1010
- run a modified performant DLRMv2 benchmark and inspect the results
1111

1212
prerequisites:
13-
- An Arm-based cloud instance with at lest 250GB of RAM and 500 GB of disk space
13+
- An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
1414

1515
author: Annie Tallund
1616

0 commit comments

Comments
 (0)