You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md
+9-48Lines changed: 9 additions & 48 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,29 +8,29 @@ layout: learningpathall
8
8
9
9
## Overview
10
10
11
-
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
11
+
DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like, using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
12
12
13
-
### Arm Neoverse
13
+
### Arm Neoverse CPUs
14
14
15
-
Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
15
+
The Arm Neoverse V2 CPU is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, offer Arm-based instances, making it easier to deploy ML models at a lower cost.
16
16
17
17
### About the benchmark
18
18
19
-
The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
19
+
In this Learning Path you will learn how to evaluate the performance of the [DLRM using the MLPerf Inference suite](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch) in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
20
20
21
-
The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
21
+
You will run tests that measure throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
22
22
23
-
## Configure developer environment
23
+
## Configure your environment
24
24
25
-
Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
25
+
Before you can run the benchmark, you will need an Arm-based instance from a Cloud Service Provider (CSP). The instructions in this Learning Path have been tested on the 2 Arm-based instances listed below running Ubuntu 22.04.
26
26
27
27
| CSP | Instance type |
28
28
| --------------------- | -------------- |
29
29
| Google Cloud Platform | c4a-highmem-72 |
30
30
| Amazon Web Services | r8g.16xlarge |
31
31
32
32
### Verify Python installation
33
-
Make sure Python is installed by running the following and making sure a version is printed.
33
+
On your running Arm-based instance, make sure Python is installed by running the following command and checking the version:
34
34
35
35
```bash
36
36
python3 --version
@@ -39,43 +39,4 @@ python3 --version
39
39
```output
40
40
Python 3.12.6
41
41
```
42
-
43
-
## Install Docker
44
-
45
-
Start by adding the official Docker GPG key to your system’s APT keyrings directory:
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,14 @@ layout: learningpathall
8
8
9
9
Before building the model, you need to obtain the data and model weights. Start by creating directories for the two in your cloud instance.
10
10
11
-
## Install rclone and
12
-
13
11
```bash
14
12
cd$HOME
15
13
mkdir data
16
14
mkdir model
17
15
```
16
+
## Install rclone
17
+
18
+
You will use `rclone` to [download the data and model weights](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch#download-preprocessed-dataset).
Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
41
+
Run the commands below to download the data and model weights. This process can take 30 minutes or more depending on the internet connection in your cloud instance.
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
48
+
Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to run the benchmark in order to measure the performance of the downloaded DLRM model.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md
+19-28Lines changed: 19 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,43 +6,32 @@ weight: 5
6
6
layout: learningpathall
7
7
---
8
8
9
-
In this section, you will run the benchmark and inspect the results.
9
+
In this section, you will run a modified version of the the [MLPerf benchmark for DLRM](https://github.com/mlcommons/inference_results_v4.0/tree/main/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8) and inspect the results.
10
10
11
-
## Build PyTorch
11
+
You will use a nightly wheel for PyTorch which includes optimizations that improve the performance of recommendation models on Arm.
12
12
13
-
You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
23
-
24
-
```bash
25
-
cd ML-Frameworks/pytorch-aarch64/
26
-
./build.sh
27
-
```
28
-
29
-
You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
30
13
31
14
## Run the benchmark
32
15
33
-
A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
16
+
The scripts to setup and run the benchmark are included for your convenience in a repository. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite tailored for Arm-based systems.
Set the environment variables to point to the downloaded data and model weights:
25
+
```
26
+
export DATA_DIR=$HOME/data
27
+
export MODEL_DIR=$HOME/model
28
+
```
41
29
42
-
The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
30
+
You can now run the main script `run_dlrm_benchmark.sh`. This script automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
43
31
44
-
* Initializes and configures MLPerf repositories within the container.
45
-
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
32
+
* Initializes and configures MLPerf repositories.
33
+
* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase.
34
+
* Uses PyTorch nighty wheel `torch==2.8.0.dev20250324+cpu` with the Arm performance improvements.
46
35
* Converts pretrained weights into a usable model format.
47
36
* Performs INT8 calibration if needed.
48
37
* Executes the offline benchmark test, generating large-scale binary data during runtime.
@@ -61,15 +50,17 @@ To run the `fp32` offline test, it's recommended to use the pre-generated binary
61
50
62
51
## Understanding the results
63
52
64
-
As a final step, have a look at the results generated in a text file.
53
+
As a final step, have a look at the results generated at the end of the script execution.
54
+
55
+
The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high throughput of these probabilities.
65
56
66
-
The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
57
+
The output is also saved in a log file.
67
58
68
59
```bash
69
60
cat $HOME/results/int8/mlperf_log_summary.txt
70
61
```
71
62
72
-
Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
63
+
Your output should contain a `Samples per second` entry, where each sample tells probability of the user clicking a certain ad.
On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
115
+
By successfully running the benchmark, you have gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner, an essential skill for deploying and optimizing AI workloads on modern platforms.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/dlrm/_index.md
+12-16Lines changed: 12 additions & 16 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,22 +1,21 @@
1
1
---
2
2
title: MLPerf Benchmarking on Arm Neoverse V2
3
3
4
-
draft: true
5
-
cascade:
6
-
draft: true
7
4
8
5
minutes_to_complete: 90
9
6
10
-
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
7
+
who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the Deep Learning Recommendation Model(DLRM) and benchmark its performance using MLPerf and PyTorch.
11
8
12
9
learning_objectives:
13
-
- build the Deep Learning Recommendation Model (DLRM) using a Docker image
14
-
- run a modified performant DLRMv2 benchmark and inspect the results
10
+
- Build the Deep Learning Recommendation Model (DLRM)
11
+
- Run a modified performant DLRMv2 benchmark and inspect the results
15
12
16
13
prerequisites:
17
-
- An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
14
+
- Any [Armbased instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server with at least 400GB of RAM and 800 GB of disk space.
18
15
19
-
author: Annie Tallund
16
+
author:
17
+
- Annie Tallund
18
+
- Pareena Verma
20
19
21
20
### Tags
22
21
skilllevels: Introductory
@@ -26,22 +25,19 @@ armips:
26
25
tools_software_languages:
27
26
- Docker
28
27
- MLPerf
28
+
- Google Cloud
29
29
operatingsystems:
30
30
- Linux
31
31
cloud_service_providers: AWS
32
32
33
33
further_reading:
34
34
- resource:
35
-
title: PLACEHOLDER MANUAL
36
-
link: PLACEHOLDER MANUAL LINK
35
+
title: MLPerf Inference Benchmarks for Recommendation
0 commit comments