Merge pull request #1782 from pareenaverma/content_review

pareenaverma · web-flow · commit 4b5469ed0fcf · 2025-04-01T15:51:44.000-04:00
Updated the DLRM LP to use nightly torch wheel
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md b/content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md
@@ -8,29 +8,29 @@ layout: learningpathall
 
 ## Overview
 
-DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
+DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like, using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
 
-### Arm Neoverse
+### Arm Neoverse CPUs
 
-Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
+The Arm Neoverse V2 CPU is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, offer Arm-based instances, making it easier to deploy ML models at a lower cost.
 
 ### About the benchmark
 
-The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
+In this Learning Path you will learn how to evaluate the performance of the [DLRM using the MLPerf Inference suite](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch) in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
 
-The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
+You will run tests that measure throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
 
-## Configure developer environment
+## Configure your environment
 
-Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
+Before you can run the benchmark, you will need an Arm-based instance from a Cloud Service Provider (CSP). The instructions in this Learning Path have been tested on the 2 Arm-based instances listed below running Ubuntu 22.04.
 
 |         CSP           |  Instance type |
 | --------------------- | -------------- |
 | Google Cloud Platform | c4a-highmem-72 |
 | Amazon Web Services   | r8g.16xlarge   |
 
 ### Verify Python installation
-Make sure Python is installed by running the following and making sure a version is printed.
+On your running Arm-based instance, make sure Python is installed by running the following command and checking the version:
 
 ```bash
 python3 --version
@@ -39,43 +39,4 @@ python3 --version
 ```output
 Python 3.12.6
 ```
-
-## Install Docker
-
-Start by adding the official Docker GPG key to your system’s APT keyrings directory:
-
-```bash
-sudo install -m 0755 -d /etc/apt/keyrings
-sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
-sudo chmod a+r /etc/apt/keyrings/docker.asc
-```
-
-Run the following command to add the official Docker repository to your system’s APT sources list:
-
-```bash
-echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
-    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
-```
-
-After adding the repository, refresh the package index to ensure the latest versions are available:
-
-```bash
-sudo apt-get update
-```
-Now, install the necessary dependencies, including Docker and related components:
-```bash
-sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
-```
-
-This ensures that Docker and its dependencies are correctly set up on your system.
-
-{{% notice Note %}}
-If you run into permission issues with Docker, try running the following:
-
-```bash
-sudo usermod -aG docker $USER
-newgrp docker
-```
-{{% /notice %}}
-
-With your development environment set up, you can move on to download the model.
+With your development environment set up, you can move on to downloading and running the model.
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md b/content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md
@@ -8,13 +8,14 @@ layout: learningpathall
 
 Before building the model, you need to obtain the data and model weights. Start by creating directories for the two in your cloud instance.
 
-## Install rclone and
-
 ```bash
 cd $HOME
 mkdir data
 mkdir model
 ```
+## Install rclone
+
+You will use `rclone` to [download the data and model weights](https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch#download-preprocessed-dataset).
 
 Install `rclone` using the bash script.
 
@@ -37,12 +38,12 @@ rclone config create mlc-inference s3 provider=Cloudflare \
     endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
 ```
 
-Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
+Run the commands below to download the data and model weights. This process can take 30 minutes or more depending on the internet connection in your cloud instance.
 
 ```bash
 rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data  -P
 rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
 ```
 
-Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
+Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to run the benchmark in order to measure the performance of the downloaded DLRM model.
 
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md b/content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md
@@ -6,43 +6,32 @@ weight: 5
 layout: learningpathall
 ---
 
-In this section, you will run the benchmark and inspect the results.
+In this section, you will run a modified version of the the [MLPerf benchmark for DLRM](https://github.com/mlcommons/inference_results_v4.0/tree/main/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8) and inspect the results.
 
-## Build PyTorch
+You will use a nightly wheel for PyTorch which includes optimizations that improve the performance of recommendation models on Arm.
 
-You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
-
-```bash
-cd $HOME
-git clone https://github.com/ARM-software/Tool-Solutions.git
-cd $HOME/Tool-Solutions/
-git checkout f606cb6276be38bbb264b5ea64809c34837959c4
-```
-
-The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
-
-```bash
-cd ML-Frameworks/pytorch-aarch64/
-./build.sh
-```
-
-You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
 
 ## Run the benchmark
 
- A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
+The scripts to setup and run the benchmark are included for your convenience in a repository. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite tailored for Arm-based systems.
 
-Start by cloning it.
+Start by cloning the repository:
 
  ```bash
  cd $HOME
  git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
  ```
+Set the environment variables to point to the downloaded data and model weights:
+```
+export DATA_DIR=$HOME/data
+export MODEL_DIR=$HOME/model
+```
 
-The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
+You can now run the main script `run_dlrm_benchmark.sh`. This script automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
 
-* Initializes and configures MLPerf repositories within the container.
-* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
+* Initializes and configures MLPerf repositories.
+* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase.
+* Uses PyTorch nighty wheel `torch==2.8.0.dev20250324+cpu` with the Arm performance improvements.
 * Converts pretrained weights into a usable model format.
 * Performs INT8 calibration if needed.
 * Executes the offline benchmark test, generating large-scale binary data during runtime.
@@ -61,15 +50,17 @@ To run the `fp32` offline test, it's recommended to use the pre-generated binary
 
 ## Understanding the results
 
-As a final step, have a look at the results generated in a text file.
+As a final step, have a look at the results generated at the end of the script execution.
+
+The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high throughput of these probabilities.
 
-The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
+The output is also saved in a log file.
 
 ```bash
 cat $HOME/results/int8/mlperf_log_summary.txt
 ```
 
-Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
+Your output should contain a `Samples per second` entry, where each sample tells probability of the user clicking a certain ad.
 
 ```output
 ================================================
@@ -121,4 +112,4 @@ performance_issue_same_index : 0
 performance_sample_count : 204800
 ```
 
-On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
+By successfully running the benchmark, you have gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner, an essential skill for deploying and optimizing AI workloads on modern platforms.
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/_index.md b/content/learning-paths/servers-and-cloud-computing/dlrm/_index.md
@@ -1,22 +1,21 @@
 ---
 title: MLPerf Benchmarking on Arm Neoverse V2
 
-draft: true
-cascade:
-    draft: true
 
 minutes_to_complete: 90
 
-who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
+who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the Deep Learning Recommendation Model(DLRM) and benchmark its performance using MLPerf and PyTorch.
 
 learning_objectives:
-    - build the Deep Learning Recommendation Model (DLRM) using a Docker image
-    - run a modified performant DLRMv2 benchmark and inspect the results
+    - Build the Deep Learning Recommendation Model (DLRM) 
+    - Run a modified performant DLRMv2 benchmark and inspect the results
 
 prerequisites:
-    - An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
+    - Any [Arm based instance](/learning-paths/servers-and-cloud-computing/csp/) from a cloud service provider or an on-premise Arm server with at least 400GB of RAM and 800 GB of disk space.
 
-author: Annie Tallund
+author: 
+    - Annie Tallund
+    - Pareena Verma
 
 ### Tags
 skilllevels: Introductory
@@ -26,22 +25,19 @@ armips:
 tools_software_languages:
     - Docker
     - MLPerf
+    - Google Cloud
 operatingsystems:
     - Linux
 cloud_service_providers: AWS
 
 further_reading:
     - resource:
-        title: PLACEHOLDER MANUAL
-        link: PLACEHOLDER MANUAL LINK
+        title: MLPerf Inference Benchmarks for Recommendation
+        link: https://github.com/mlcommons/inference/tree/master/recommendation/dlrm_v2/pytorch
         type: documentation
     - resource:
-        title: PLACEHOLDER BLOG
-        link: PLACEHOLDER BLOG LINK
-        type: blog
-    - resource:
-        title: PLACEHOLDER GENERAL WEBSITE
-        link: PLACEHOLDER GENERAL WEBSITE LINK
+        title: MLPerf Inference Benchmark Suite
+        link: https://github.com/mlcommons/inference/tree/master
         type: website