Skip to content

Commit 4b5469e

Browse files
authored
Merge pull request #1782 from pareenaverma/content_review
Updated the DLRM LP to use nightly torch wheel
2 parents b5f9ae2 + a9acc33 commit 4b5469e

File tree

4 files changed

+45
-96
lines changed

4 files changed

+45
-96
lines changed

content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md

Lines changed: 9 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -8,29 +8,29 @@ layout: learningpathall
88

99
## Overview
1010

11-
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
11+
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like, using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
1212

13-
### Arm Neoverse
13+
### Arm Neoverse CPUs
1414

15-
Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
15+
The Arm Neoverse V2 CPU is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, offer Arm-based instances, making it easier to deploy ML models at a lower cost.
1616

1717
### About the benchmark
1818

19-
The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
19+
In this Learning Path you will learn how to evaluate the performance of the [DLRM using the MLPerf Inference suite](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch) in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
2020

21-
The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
21+
You will run tests that measure throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
2222

23-
## Configure developer environment
23+
## Configure your environment
2424

25-
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
25+
Before you can run the benchmark, you will need an Arm-based instance from a Cloud Service Provider (CSP). The instructions in this Learning Path have been tested on the 2 Arm-based instances listed below running Ubuntu 22.04.
2626

2727
| CSP | Instance type |
2828
| --------------------- | -------------- |
2929
| Google Cloud Platform | c4a-highmem-72 |
3030
| Amazon Web Services | r8g.16xlarge |
3131

3232
### Verify Python installation
33-
Make sure Python is installed by running the following and making sure a version is printed.
33+
On your running Arm-based instance, make sure Python is installed by running the following command and checking the version:
3434

3535
```bash
3636
python3 --version
@@ -39,43 +39,4 @@ python3 --version
3939
```output
4040
Python 3.12.6
4141
```
42-
43-
## Install Docker
44-
45-
Start by adding the official Docker GPG key to your system’s APT keyrings directory:
46-
47-
```bash
48-
sudo install -m 0755 -d /etc/apt/keyrings
49-
sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
50-
sudo chmod a+r /etc/apt/keyrings/docker.asc
51-
```
52-
53-
Run the following command to add the official Docker repository to your system’s APT sources list:
54-
55-
```bash
56-
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
57-
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
58-
```
59-
60-
After adding the repository, refresh the package index to ensure the latest versions are available:
61-
62-
```bash
63-
sudo apt-get update
64-
```
65-
Now, install the necessary dependencies, including Docker and related components:
66-
```bash
67-
sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
68-
```
69-
70-
This ensures that Docker and its dependencies are correctly set up on your system.
71-
72-
{{% notice Note %}}
73-
If you run into permission issues with Docker, try running the following:
74-
75-
```bash
76-
sudo usermod -aG docker $USER
77-
newgrp docker
78-
```
79-
{{% /notice %}}
80-
81-
With your development environment set up, you can move on to download the model.
42+
With your development environment set up, you can move on to downloading and running the model.

content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,14 @@ layout: learningpathall
88

99
Before building the model, you need to obtain the data and model weights. Start by creating directories for the two in your cloud instance.
1010

11-
## Install rclone and
12-
1311
```bash
1412
cd $HOME
1513
mkdir data
1614
mkdir model
1715
```
16+
## Install rclone
17+
18+
You will use `rclone` to [download the data and model weights](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch#download-preprocessed-dataset).
1819

1920
Install `rclone` using the bash script.
2021

@@ -37,12 +38,12 @@ rclone config create mlc-inference s3 provider=Cloudflare \
3738
endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
3839
```
3940

40-
Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
41+
Run the commands below to download the data and model weights. This process can take 30 minutes or more depending on the internet connection in your cloud instance.
4142

4243
```bash
4344
rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data -P
4445
rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
4546
```
4647

47-
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
48+
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to run the benchmark in order to measure the performance of the downloaded DLRM model.
4849

content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md

Lines changed: 19 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -6,43 +6,32 @@ weight: 5
66
layout: learningpathall
77
---
88

9-
In this section, you will run the benchmark and inspect the results.
9+
In this section, you will run a modified version of the the [MLPerf benchmark for DLRM](https://github.com/mlcommons/inference_results_v4.0/tree/main/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8) and inspect the results.
1010

11-
## Build PyTorch
11+
You will use a nightly wheel for PyTorch which includes optimizations that improve the performance of recommendation models on Arm.
1212

13-
You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
14-
15-
```bash
16-
cd $HOME
17-
git clone https://github.com/ARM-software/Tool-Solutions.git
18-
cd $HOME/Tool-Solutions/
19-
git checkout f606cb6276be38bbb264b5ea64809c34837959c4
20-
```
21-
22-
The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
23-
24-
```bash
25-
cd ML-Frameworks/pytorch-aarch64/
26-
./build.sh
27-
```
28-
29-
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
3013

3114
## Run the benchmark
3215

33-
A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
16+
The scripts to setup and run the benchmark are included for your convenience in a repository. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite tailored for Arm-based systems.
3417

35-
Start by cloning it.
18+
Start by cloning the repository:
3619

3720
```bash
3821
cd $HOME
3922
git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
4023
```
24+
Set the environment variables to point to the downloaded data and model weights:
25+
```
26+
export DATA_DIR=$HOME/data
27+
export MODEL_DIR=$HOME/model
28+
```
4129

42-
The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
30+
You can now run the main script `run_dlrm_benchmark.sh`. This script automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
4331

44-
* Initializes and configures MLPerf repositories within the container.
45-
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
32+
* Initializes and configures MLPerf repositories.
33+
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase.
34+
* Uses PyTorch nighty wheel `torch==2.8.0.dev20250324+cpu` with the Arm performance improvements.
4635
* Converts pretrained weights into a usable model format.
4736
* Performs INT8 calibration if needed.
4837
* Executes the offline benchmark test, generating large-scale binary data during runtime.
@@ -61,15 +50,17 @@ To run the `fp32` offline test, it's recommended to use the pre-generated binary
6150

6251
## Understanding the results
6352

64-
As a final step, have a look at the results generated in a text file.
53+
As a final step, have a look at the results generated at the end of the script execution.
54+
55+
The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high throughput of these probabilities.
6556

66-
The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
57+
The output is also saved in a log file.
6758

6859
```bash
6960
cat $HOME/results/int8/mlperf_log_summary.txt
7061
```
7162

72-
Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
63+
Your output should contain a `Samples per second` entry, where each sample tells probability of the user clicking a certain ad.
7364

7465
```output
7566
================================================
@@ -121,4 +112,4 @@ performance_issue_same_index : 0
121112
performance_sample_count : 204800
122113
```
123114

124-
On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manneran essential skill for deploying and optimizing AI workloads on modern platforms.
115+
By successfully running the benchmark, you have gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner, an essential skill for deploying and optimizing AI workloads on modern platforms.

content/learning-paths/servers-and-cloud-computing/dlrm/_index.md

Lines changed: 12 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,21 @@
11
---
22
title: MLPerf Benchmarking on Arm Neoverse V2
33

4-
draft: true
5-
cascade:
6-
draft: true
74

85
minutes_to_complete: 90
96

10-
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
7+
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the Deep Learning Recommendation Model(DLRM) and benchmark its performance using MLPerf and PyTorch.
118

129
learning_objectives:
13-
- build the Deep Learning Recommendation Model (DLRM) using a Docker image
14-
- run a modified performant DLRMv2 benchmark and inspect the results
10+
- Build the Deep Learning Recommendation Model (DLRM)
11+
- Run a modified performant DLRMv2 benchmark and inspect the results
1512

1613
prerequisites:
17-
- An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
14+
- Any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server with at least 400GB of RAM and 800 GB of disk space.
1815

19-
author: Annie Tallund
16+
author:
17+
- Annie Tallund
18+
- Pareena Verma
2019

2120
### Tags
2221
skilllevels: Introductory
@@ -26,22 +25,19 @@ armips:
2625
tools_software_languages:
2726
- Docker
2827
- MLPerf
28+
- Google Cloud
2929
operatingsystems:
3030
- Linux
3131
cloud_service_providers: AWS
3232

3333
further_reading:
3434
- resource:
35-
title: PLACEHOLDER MANUAL
36-
link: PLACEHOLDER MANUAL LINK
35+
title: MLPerf Inference Benchmarks for Recommendation
36+
link: https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch
3737
type: documentation
3838
- resource:
39-
title: PLACEHOLDER BLOG
40-
link: PLACEHOLDER BLOG LINK
41-
type: blog
42-
- resource:
43-
title: PLACEHOLDER GENERAL WEBSITE
44-
link: PLACEHOLDER GENERAL WEBSITE LINK
39+
title: MLPerf Inference Benchmark Suite
40+
link: https://github.com/mlcommons/inference/tree/master
4541
type: website
4642

4743

0 commit comments

Comments
 (0)