ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md‎
Lines changed: 8 additions & 2 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md‎
Lines changed: 8 additions & 2 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md‎
Lines changed: 2 additions & 21 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md‎
Lines changed: 2 additions & 21 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md‎
Lines changed: 41 additions & 111 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md‎
Lines changed: 41 additions & 111 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/_index.md‎
Lines changed: 2 additions & 2 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/_index.md‎
Lines changed: 2 additions & 2 deletions
@@ -10,13 +10,19 @@ layout: learningpathall
 
 DLRM is a machine learning model designed for recommendation systems, like the ones used by streaming services or online stores. It helps predict what a user might like using embedding layers that turn categories into useful numerical representations, and multilayer perceptrons (MLPs) that process continuous data. The real magic happens in the feature interaction step, where DLRM figures out which factors matter most when making recommendations.
 
+### Arm Neoverse
+
 Arm Neoverse V2 is built for high-performance computing, making it a great fit for machine learning workloads. Unlike traditional CPUs, it's designed with energy efficiency and scalability in mind, which means it can handle AI tasks without consuming excessive power. It also includes advanced vector processing and memory optimizations, which help speed up AI model training and inference. Another advantage? Many cloud providers, like AWS and GCP, now offer Arm-based instances, making it easier to deploy ML models at a lower cost. Whether you’re training a deep learning model or running large-scale inference workloads, Neoverse V2 is optimized to deliver solid performance while keeping costs under control.
 
-Running MLPerf benchmarks on Arm’s Neoverse V2 platform assesses how well models like DLRM perform on this architecture.
+### About the benchmark
+
+The benchmark run in this learning path evaluates the performance of the DLRM using the MLPerf Inference suite in the _Offline_ scenario. The Offline scenario is a test scenario where large batches of data are processed all at once, rather than in real-time. It simulates large-scale, batch-style inference tasks commonly found in recommendation systems for e-commerce, streaming, and social platforms.
+
+The test measures throughput (samples per second) and latency, providing insights into how efficiently the model runs on the target system. By using MLPerf’s standardized methodology, the results offer a reliable comparison point for evaluating performance across different hardware and software configurations—highlighting the system’s ability to handle real-world, data-intensive AI workloads.
 
 ## Configure developer environment
 
-Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples in the table below. These instructions have been tested on Ubuntu 22.04.
+Before you can run the benchmark, you will need an Arm-based Cloud Service Provider (CSP) instance. See examples of instance types in the table below. These instructions have been tested on Ubuntu 22.04.
 
 |         CSP           |  Instance type |
 | --------------------- | -------------- |
 
@@ -37,31 +37,12 @@ rclone config create mlc-inference s3 provider=Cloudflare \
     endpoint=https://c2686074cb2caf5cbaf6d134bdba8b47.r2.cloudflarestorage.com
 ```
 
-You will now download the data and model weights. This process takes an hour or more depending on your internet connection.
+Run the commands below to download the data and model weights. This process takes 30 minutes or more depending on the internet connection in your cloud instance.
 
 ```bash
 rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data  -P
 rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
 ```
 
-Once it finishes, you should see that the `model` and `data` directories are populated.
+Once it finishes, you should see that the `model` and `data` directories are populated. Now that the data is in place, you can proceed to the next section to set up a Docker image which will be used to run the benchmark.
 
-## Build DLRM image
-
-You will use a branch of the the `Tool-Solutions` repository. This branch includes releases of PyTorch which enhance the performance of ML frameworks.
-
-```bash
-cd $HOME
-git clone https://github.com/ARM-software/Tool-Solutions.git
-cd $HOME/Tool-Solutions/
-git checkout ${1:-"pytorch-aarch64--r24.12"}
-```
-
-The `build.sh` script builds a wheel and a Docker image containing PyTorch and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section.
-
-```bash
-cd ML-Frameworks/pytorch-aarch64/
-./build.sh
-```
-
-You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
@@ -6,131 +6,59 @@ weight: 5
 layout: learningpathall
 ---
 
-The final step is to run the benchmark.
+In this section, you will run the benchmark and inspect the results.
 
-## Download patches
+## Build PyTorch
 
-Start by downloading the patches which will be applied during setup.
+You will use a commit hash of the the `Tool-Solutions` repository to set up a Docker container with PyTorch. It will includes releases of PyTorch which enhance the performance of ML frameworks on Arm.
 
 ```bash
-wget -r --no-parent https://github.com/ArmDeveloperEcosystem/arm-learning-paths/tree/main/content/learning-paths/servers-and-cloud-computing/dlrm/mlpef_patches $HOME/mlperf_patches
+cd $HOME
+git clone https://github.com/ARM-software/Tool-Solutions.git
+cd $HOME/Tool-Solutions/
+git checkout f606cb6276be38bbb264b5ea64809c34837959c4
 ```
 
-## Benchmark script
-
-You will now create a script that automates the setup, configuration, and execution of MLPerf benchmarking for the DLRM (Deep Learning Recommendation Model) inside a Docker container. It simplifies the process by handling dependency installation, model preparation, and benchmarking in a single run. Create a new file called `run_dlrm_benchmark.sh`. Paste the code below.
+The `build.sh` script builds a wheel and a Docker image containing a PyTorch wheel and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
 
 ```bash
-#!/bin/bash
-
-set -ex
-yellow="\e[33m"
-reset="\e[0m"
-
-data_type=${1:-"int8"}
-
-echo -e "${yellow}Data type chosen for the setup is $data_type${reset}"
-
-# Setup directories
-data_dir=$HOME/data/
-model_dir=$HOME/model/
-results_dir=$HOME/results/
-dlrm_container="benchmark_dlrm"
-
-mkdir -p $results_dir/$data_type
-
-###### Run the dlrm container and setup MLPerf #######
-
-echo -e "${yellow}Checking if the container '$dlrm_container' exists...${reset}"
-container_exists=$(docker ps -aqf "name=^$dlrm_container$")
-
-if [ -n "$container_exists" ]; then
-    echo "${yellow}Container '$dlrm_container' already exists.${reset}"
-else
-    echo "Creating a new '$dlrm_container' container..."
-    docker run -td --shm-size=200G --privileged \
-        -v $data_dir:$data_dir \
-        -v $model_dir:$model_dir \
-        -v $results_dir:$results_dir \
-        -e DATA_DIR=$data_dir \
-        -e MODEL_DIR=$model_dir \
-        -e PATH=/opt/conda/bin:$PATH \
-        --name=$dlrm_container \
-        toolsolutions-pytorch:latest
-fi
-
-echo -e "${yellow}Setting up MLPerf inside the container...${reset}"
-docker cp $HOME/mlperf_patches $dlrm_container:$HOME/
-docker exec -it $dlrm_container bash -c "
-    set -ex
-    sudo apt update && sudo apt install -y \
-        software-properties-common lsb-release scons \
-        build-essential libtool autoconf unzip git vim wget \
-        numactl cmake gcc-12 g++-12 python3-pip python-is-python3
-    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12 --slave /usr/bin/g++ g++ /usr/bin/g++-12
-
-    if [ ! -d \"/opt/conda\" ]; then
-        wget -O \"$HOME/miniconda.sh\" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
-        chmod +x \"$HOME/miniconda.sh\"
-        sudo bash \"$HOME/miniconda.sh\" -b -p /opt/conda
-        rm \"$HOME/miniconda.sh\"
-    fi
-    export PATH=\"/opt/conda/bin:$PATH\"
-    /opt/conda/bin/conda install -y python=3.10.12
-    /opt/conda/bin/conda install -y -c conda-forge cmake gperftools numpy==1.23.0 ninja pyyaml setuptools
-
-    git clone --recurse-submodules https://github.com/mlcommons/inference.git inference || (cd inference ; git pull)
-    cd inference && git submodule update --init --recursive && cd loadgen
-    CFLAGS=\"-std=c++14\" python setup.py bdist_wheel
-    pip install dist/*.whl
-
-    rm -rf inference_results_v4.0
-    git clone https://github.com/mlcommons/inference_results_v4.0.git
-    cd inference_results_v4.0 && git checkout ceef1ea
-
-    if [ \"$data_type\" = \"fp32\" ]; then
-        git apply $HOME/mlperf_patches/arm_fp32.patch
-    else
-        git apply $HOME/mlperf_patches/arm_int8.patch
-    fi
-"
-
-echo -e "${yellow}Checking for dumped FP32 model...${reset}"
-dumped_fp32_model="dlrm-multihot-pytorch.pt"
-int8_model="aarch64_dlrm_int8.pt"
-dlrm_test_path="$HOME/inference_results_v4.0/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8"
-
-if [ ! -f "$HOME/model/$dumped_fp32_model" ]; then
-    echo -e "${yellow}Dumping model weights...${reset}"
-    docker exec -it "$dlrm_container" bash -c "
-        pip install -r --extra-index-url https://download.pytorch.org/whl/nightly/cpu tensordict==0.1.2 torchsnapshot==0.1.0 fbgemm_gpu==2025.1.22+cpu torchrec==1.1.0.dev20250127+cpu
-    "
-    docker exec -it "$dlrm_container" bash -c "
-        cd $dlrm_test_path && python python/dump_torch_model.py --model-path=$model_dir/model_weights --dataset-path=$data_dir
-    "
-fi
-
-echo -e "${yellow}Checking if INT8 model calibration is required...${reset}"
-if [ "$data_type" == "int8" ] && [ ! -f "$HOME/model/$int8_model" ]; then
-    echo -e "${yellow}Running INT8 calibration...${reset}"
-    docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && ./run_calibration.sh"
-fi
-
-echo -e "${yellow}Running offline test...${reset}"
-docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && bash run_main.sh offline $data_type"
-
-echo -e "${yellow}Copying results to host...${reset}"
-docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && cp -r output/pytorch-cpu/dlrm/Offline/performance/run_1/* $results_dir/$data_type/"
-
-cat $results_dir/$data_type/mlperf_log_summary.txt
+cd ML-Frameworks/pytorch-aarch64/
+./build.sh
 ```
 
-With the script ready, it's time to run the benchmark:
+You now have everything set up to analyze the performance. Proceed to the next section to run the benchmark and inspect the results.
+
+## Run the benchmark
+
+ A repository is set up to run the next steps. This collection of scripts streamlines the process of building and running the DLRM (Deep Learning Recommendation Model) benchmark from the MLPerf suite inside a Docker container, tailored for Arm-based systems.
+
+Start by cloning it.
+
+ ```bash
+ cd $HOME
+ git clone https://github.com/ArmDeveloperEcosystem/dlrm-mlperf-lp.git
+ ```
+
+The main script is the `run_dlrm_benchmark.sh`. At a glance, it automates the full workflow of executing the MLPerf DLRM benchmark by performing the following steps:
+
+* Initializes and configures MLPerf repositories within the container.
+* Applies necessary patches (from `mlperf_patches/`) and compiles the MLPerf codebase inside the container.
+* Converts pretrained weights into a usable model format.
+* Performs INT8 calibration if needed.
+* Executes the offline benchmark test, generating large-scale binary data during runtime.
 
 ```bash
-./run_dlrm_benchmark.sh
+cd dlrm-mlperf-lp
+./run_dlrm_benchmark.sh int8
 ```
 
+The script can take an hour or more to run.
+
+{{% notice Note %}}
+
+To run the `fp32` offline test, it's recommended to use the pre-generated binary data files from the int8 tests. You will need a CSP instance with enough RAM. For this purpose, the AWS `r8g.24xlarge` is recommended. After running the `int8` test, save the files in the `model` and `data` directories, and copy them to the instance intended for the `fp32` benchmark.
+{{% /notice %}}
+
 ## Understanding the results
 
 As a final step, have a look at the results generated in a text file.
@@ -192,3 +120,5 @@ performance_issue_same : 0
 performance_issue_same_index : 0
 performance_sample_count : 204800
 ```
+
+On successfully running the benchmark, you’ve gained practical experience in evaluating large-scale AI recommendation systems in a reproducible and efficient manner—an essential skill for deploying and optimizing AI workloads on modern platforms.
@@ -1,7 +1,7 @@
 ---
 title: MLPerf Benchmarking on Arm Neoverse V2
 
-minutes_to_complete: 10
+minutes_to_complete: 90
 
 who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
 
@@ -10,7 +10,7 @@ learning_objectives:
     - run a modified performant DLRMv2 benchmark and inspect the results
 
 prerequisites:
-    - An Arm-based cloud instance with at lest 250GB of RAM and 500 GB of disk space
+    - An Arm-based cloud instance with at lest 400GB of RAM and 800 GB of disk space
 
 author: Annie Tallund