Merge pull request #1657 from annietllnd/dlrm-grav4

pareenaverma · web-flow · commit 06138719810b · 2025-03-27T12:09:43.000-04:00
Add DLRM with MLPerf LP
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md b/content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md
@@ -0,0 +1,81 @@
+---
+title: Overview and setup
+weight: 2
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+## Overview
+
+DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
+
+### Arm Neoverse
+
+Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
+
+### About the benchmark
+
+The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
+
+The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
+
+## Configure developer environment
+
+Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
+
+|         CSP           |  Instance type |
+| --------------------- | -------------- |
+| Google Cloud Platform | c4a-highmem-72 |
+| Amazon Web Services   | r8g.16xlarge   |
+
+### Verify Python installation
+Make sure Python is installed by running the following and making sure a version is printed.
+
+```bash
+python3 --version
+```
+
+```output
+Python 3.12.6
+```
+
+## Install Docker
+
+Start by adding the official Docker GPG key to your system’s APT keyrings directory:
+
+```bash
+sudo install -m 0755 -d /etc/apt/keyrings
+sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
+sudo chmod a+r /etc/apt/keyrings/docker.asc
+```
+
+Run the following command to add the official Docker repository to your system’s APT sources list:
+
+```bash
+echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
+    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
+```
+
+After adding the repository, refresh the package index to ensure the latest versions are available:
+
+```bash
+sudo apt-get update
+```
+Now, install the necessary dependencies, including Docker and related components:
+```bash
+sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
+```
+
+This ensures that Docker and its dependencies are correctly set up on your system.
+
+{{% notice Note %}}
+If you run into permission issues with Docker, try running the following:
+
+```bash
+sudo usermod -aG docker $USER
+newgrp docker
+```
+{{% /notice %}}
+
+With your development environment set up, you can move on to download the model.
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md b/content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md
@@ -0,0 +1,48 @@
+---
+title: Download model weights and data
+weight: 3
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+Before building the model, you need to obtain the data and model weights. Start by creating directories for the two in your cloud instance.
+
+## Install rclone and
+
+```bash
+cd $HOME
+mkdir data
+mkdir model
+```
+
+Install `rclone` using the bash script.
+
+```bash
+curl https://rclone.org/install.sh | sudo bash
+```
+
+You should see a similar output if the tools installed successfully.
+```output
+rclone v1.69.1 has successfully installed.
+Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.
+```
+
+Configure the following credentials for rclone:
+
+```bash
+rclone config create mlc-inference s3 provider=Cloudflare \
+    access_key_id=f65ba5eef400db161ea49967de89f47b \
+    secret_access_key=fbea333914c292b854f14d3fe232bad6c5407bf0ab1bebf78833c2b359bdfd2b \
+    endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
+```
+
+Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
+
+```bash
+rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data  -P
+rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
+```
+
+Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
+
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md b/content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md
@@ -0,0 +1,124 @@
+---
+title: Run the benchmark
+weight: 5
+
+### FIXED, DO NOT MODIFY
+layout: learningpathall
+---
+
+In this section, you will run the benchmark and inspect the results.
+
+## Build PyTorch
+
+You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
+
+```bash
+cd $HOME
+git clone https://github.com/ARM-software/Tool-Solutions.git
+cd $HOME/Tool-Solutions/
+git checkout f606cb6276be38bbb264b5ea64809c34837959c4
+```
+
+The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
+
+```bash
+cd ML-Frameworks/pytorch-aarch64/
+./build.sh
+```
+
+You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
+
+## Run the benchmark
+
+ A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
+
+Start by cloning it.
+
+ ```bash
+ cd $HOME
+ git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
+ ```
+
+The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
+
+* Initializes and configures MLPerf repositories within the container.
+* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
+* Converts pretrained weights into a usable model format.
+* Performs INT8 calibration if needed.
+* Executes the offline benchmark test, generating large-scale binary data during runtime.
+
+```bash
+cd dlrm-mlperf-lp
+./run_dlrm_benchmark.sh int8
+```
+
+The script can take an hour or more to run.
+
+{{% notice Note %}}
+
+To run the `fp32` offline test, it's recommended to use the pre-generated binary data files from the int8 tests. You will need a CSP instance with enough RAM. For this purpose, the AWS `r8g.24xlarge` is recommended. After running the `int8` test, save the files in the `model` and `data` directories, and copy them to the instance intended for the `fp32` benchmark.
+{{% /notice %}}
+
+## Understanding the results
+
+As a final step, have a look at the results generated in a text file.
+
+The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
+
+```bash
+cat $HOME/results/int8/mlperf_log_summary.txt
+```
+
+Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
+
+```output
+================================================
+MLPerf Results Summary
+================================================
+SUT name : PyFastSUT
+Scenario : Offline
+Mode     : PerformanceOnly
+Samples per second: 1434.8
+Result is : VALID
+  Min duration satisfied : Yes
+  Min queries satisfied : Yes
+  Early stopping satisfied: Yes
+
+================================================
+Additional Stats
+================================================
+Min latency (ns)                : 124022373
+Max latency (ns)                : 883187615166
+Mean latency (ns)               : 442524059715
+50.00 percentile latency (ns)   : 442808926434
+90.00 percentile latency (ns)   : 794977004363
+95.00 percentile latency (ns)   : 839019402197
+97.00 percentile latency (ns)   : 856679847578
+99.00 percentile latency (ns)   : 874336993877
+99.90 percentile latency (ns)   : 882255616119
+
+================================================
+Test Parameters Used
+================================================
+samples_per_query : 1267200
+target_qps : 1920
+target_latency (ns): 0
+max_async_queries : 1
+min_duration (ms): 600000
+max_duration (ms): 0
+min_query_count : 1
+max_query_count : 0
+qsl_rng_seed : 6023615788873153749
+sample_index_rng_seed : 15036839855038426416
+schedule_rng_seed : 9933818062894767841
+accuracy_log_rng_seed : 0
+accuracy_log_probability : 0
+accuracy_log_sampling_target : 0
+print_timestamps : 0
+performance_issue_unique : 0
+performance_issue_same : 0
+performance_issue_same_index : 0
+performance_sample_count : 204800
+```
+
+On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/_index.md b/content/learning-paths/servers-and-cloud-computing/dlrm/_index.md
@@ -0,0 +1,50 @@
+---
+title: MLPerf Benchmarking on Arm Neoverse V2
+
+minutes_to_complete: 90
+
+who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
+
+learning_objectives:
+    - build the Deep Learning Recommendation Model (DLRM) using a Docker image
+    - run a modified performant DLRMv2 benchmark and inspect the results
+
+prerequisites:
+    - An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
+
+author: Annie Tallund
+
+### Tags
+skilllevels: Introductory
+subjects: Performance and Architecture
+armips:
+    - Neoverse
+tools_software_languages:
+    - Docker
+    - MLPerf
+operatingsystems:
+    - Linux
+cloud_service_providers: AWS
+
+further_reading:
+    - resource:
+        title: PLACEHOLDER MANUAL
+        link: PLACEHOLDER MANUAL LINK
+        type: documentation
+    - resource:
+        title: PLACEHOLDER BLOG
+        link: PLACEHOLDER BLOG LINK
+        type: blog
+    - resource:
+        title: PLACEHOLDER GENERAL WEBSITE
+        link: PLACEHOLDER GENERAL WEBSITE LINK
+        type: website
+
+
+
+### FIXED, DO NOT MODIFY
+# ================================================================================
+weight: 1                       # _index.md always has weight of 1 to order correctly
+layout: "learningpathall"       # All files under learning paths have this same wrapper
+learning_path_main_page: "yes"  # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
+---
diff --git a/content/learning-paths/servers-and-cloud-computing/dlrm/_next-steps.md b/content/learning-paths/servers-and-cloud-computing/dlrm/_next-steps.md
@@ -0,0 +1,8 @@
+---
+# ================================================================================
+#       FIXED, DO NOT MODIFY THIS FILE
+# ================================================================================
+weight: 21                  # Set to always be larger than the content in this path to be at the end of the navigation.
+title: "Next Steps"         # Always the same, html page title.
+layout: "learningpathall"   # All files under learning paths have this same wrapper for Hugo processing.
+---