ArmDeveloperEcosystem
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md‎
Lines changed: 15 additions & 8 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/1-overview.md‎
Lines changed: 15 additions & 8 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md‎
Lines changed: 3 additions & 6 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/2-download-model.md‎
Lines changed: 3 additions & 6 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md‎
Lines changed: 68 additions & 91 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/3-run-benchmark.md‎
Lines changed: 68 additions & 91 deletions
diff --git a/‎content/learning-paths/servers-and-cloud-computing/dlrm/_index.md‎
Lines changed: 4 additions & 4 deletions b/‎content/learning-paths/servers-and-cloud-computing/dlrm/_index.md‎
Lines changed: 4 additions & 4 deletions
@@ -22,7 +22,6 @@ Before you can run the benchmark, you will need an Arm-based Cloud Service Provi
 | --------------------- | -------------- |
 | Google Cloud Platform | c4a-highmem-72 |
 | Amazon Web Services   | r8g.16xlarge   |
-| Microsoft Azure       | TODO           |
 
 ### Verify Python installation
 Make sure Python is installed by running the following and making sure a version is printed.
@@ -37,27 +36,35 @@ Python 3.12.6
 
 ## Install Docker
 
-```bash
-sudo apt-get update
-sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
-```
+Start by adding the official Docker GPG key to your system’s APT keyrings directory:
 
 ```bash
 sudo install -m 0755 -d /etc/apt/keyrings
 sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
 sudo chmod a+r /etc/apt/keyrings/docker.asc
 ```
 
+Next, install some additional dependencies dependencies:
+
+```bash
+sudo apt-get update
+sudo apt-get install ca-certificates curl docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin make -y
+```
+
+Finally, the following commands will finalize the Docker installation:
+
 ```bash
 echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
     $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 ```
 
-{{ % notice Note % }}
-If you run into permission issues with Docker, try running the following
+{{% notice Note %}}
+If you run into permission issues with Docker, try running the following:
 
 ```bash
 sudo usermod -aG docker $USER
 sudo chmod 666 /var/run/docker.sock
 ```
-{{ % /notice % }}
+{{% /notice %}}
+
+With your development environment set up, you can move on to download the model.
@@ -28,7 +28,7 @@ rclone v1.69.1 has successfully installed.
 Now run "rclone config" for setup. Check https://rclone.org/docs/ for more details.
 ```
 
-Configure the credentials as instructed.
+Configure the following credentials for rclone:
 
 ```bash
 rclone config create mlc-inference s3 provider=Cloudflare \
@@ -38,17 +38,14 @@ rclone config create mlc-inference s3 provider=Cloudflare \
 ```
 
 You will now download the data and model weights. This process takes an hour or more depending on your internet connection.
+
 ```bash
 rclone copy mlc-inference:mlcommons-inference-wg-public/dlrm_preprocessed $HOME/data  -P
 rclone copy mlc-inference:mlcommons-inference-wg-public/model_weights $HOME/model/model_weights -P
 ```
 
 Once it finishes, you should see that the `model` and `data` directories are populated.
 
-* Overview of Dataset Used in MLPerf DLRM
-* Steps to Download and Prepare the Data
-* Preprocessing Data for Training and Inference
-
 ## Build DLRM image
 
 You will use a branch of the the `Tool-Solutions` repository. This branch includes releases of PyTorch which enhance the performance of ML frameworks.
@@ -60,7 +57,7 @@ cd $HOME/Tool-Solutions/
 git checkout ${1:-"pytorch-aarch64--r24.12"}
 ```
 
-A setup script runs which installs docker and builds a PyTorch image for a specific commit hash. Finally, it runs the MLPerf container which is used for the benchmark in the next section. This script takes around 20 minutes to finish.
+The `build.sh` script builds a wheel and a Docker image containing PyTorch and dependencies. It then runs the MLPerf container which is used for the benchmark in the next section.
 
 ```bash
 cd ML-Frameworks/pytorch-aarch64/
 
@@ -6,37 +6,46 @@ weight: 5
 layout: learningpathall
 ---
 
-The final step is to run the actual benchmark.
+The final step is to run the benchmark.
+
+## Download patches
+
+Start by downloading the patches which will be applied during setup.
+
+```bash
+wget -r --no-parent https://github.com/ArmDeveloperEcosystem/arm-learning-paths/tree/main/content/learning-paths/servers-and-cloud-computing/dlrm/mlpef_patches $HOME/mlperf_patches
+```
 
 ## Benchmark script
 
-You will now create a script which uses the Docker container to run the benchmark. Create a new file called `run_dlrm_benchmark.sh`. Paste the code below.
+You will now create a script that automates the setup, configuration, and execution of MLPerf benchmarking for the DLRM (Deep Learning Recommendation Model) inside a Docker container. It simplifies the process by handling dependency installation, model preparation, and benchmarking in a single run. Create a new file called `run_dlrm_benchmark.sh`. Paste the code below.
 
 ```bash
 #!/bin/bash
 
 set -ex
 yellow="\e[33m"
 reset="\e[0m"
+
 data_type=${1:-"int8"}
+
 echo -e "${yellow}Data type chosen for the setup is $data_type${reset}"
 
-# setup environment variables for the dlrm container
+# Setup directories
 data_dir=$HOME/data/
 model_dir=$HOME/model/
 results_dir=$HOME/results/
 dlrm_container="benchmark_dlrm"
 
-# Create results directory
 mkdir -p $results_dir/$data_type
 
 ###### Run the dlrm container and setup MLPerf #######
-# Check if the container exists
+
 echo -e "${yellow}Checking if the container '$dlrm_container' exists...${reset}"
 container_exists=$(docker ps -aqf "name=^$dlrm_container$")
 
 if [ -n "$container_exists" ]; then
-    echo "${yellow}Container '$dlrm_container' already exists. Will not create a new one. ${reset}"
+    echo "${yellow}Container '$dlrm_container' already exists.${reset}"
 else
     echo "Creating a new '$dlrm_container' container..."
     docker run -td --shm-size=200G --privileged \
@@ -45,125 +54,94 @@ else
         -v $results_dir:$results_dir \
         -e DATA_DIR=$data_dir \
         -e MODEL_DIR=$model_dir \
-        -e CONDA_PREFIX=/opt/conda \
-        -e NUM_SOCKETS="1" \
-        -e CPUS_PER_SOCKET=$(nproc) \
-        -e CPUS_PER_PROCESS=$(nproc) \
-        -e CPUS_PER_INSTANCE="1" \
-        -e CPUS_FOR_LOADGEN="1"  \
-        -e BATCH_SIZE="400"  \
         -e PATH=/opt/conda/bin:$PATH \
         --name=$dlrm_container \
         toolsolutions-pytorch:latest
 fi
 
-###### Build MLPerf & Dependencies #######
-# Copy MLPerf build script to the benchmark_dlrm container
-docker cp ~/dlrm_docker_setup/build_mlperf.sh $dlrm_container:$HOME/
-
-# Copy the patches
-docker cp ~/dlrm_docker_setup/mlperf_patches $dlrm_container:$HOME/
-
-echo -e "${yellow}Setting up MLPerf benchmarking inside the container...${reset}"
-docker exec -it $dlrm_container bash -c ". $HOME/build_mlperf.sh $data_type"
-
-###### Dump the model #######
+echo -e "${yellow}Setting up MLPerf inside the container...${reset}"
+docker cp $HOME/mlperf_patches $dlrm_container:$HOME/
+docker exec -it $dlrm_container bash -c "
+    set -ex
+    sudo apt update && sudo apt install -y \
+        software-properties-common lsb-release scons \
+        build-essential libtool autoconf unzip git vim wget \
+        numactl cmake gcc-12 g++-12 python3-pip python-is-python3
+    sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-12 12 --slave /usr/bin/g++ g++ /usr/bin/g++-12
+
+    if [ ! -d \"/opt/conda\" ]; then
+        wget -O \"$HOME/miniconda.sh\" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh
+        chmod +x \"$HOME/miniconda.sh\"
+        sudo bash \"$HOME/miniconda.sh\" -b -p /opt/conda
+        rm \"$HOME/miniconda.sh\"
+    fi
+    export PATH=\"/opt/conda/bin:$PATH\"
+    /opt/conda/bin/conda install -y python=3.10.12
+    /opt/conda/bin/conda install -y -c conda-forge cmake gperftools numpy==1.23.0 ninja pyyaml setuptools
+
+    git clone --recurse-submodules https://github.com/mlcommons/inference.git inference || (cd inference ; git pull)
+    cd inference && git submodule update --init --recursive && cd loadgen
+    CFLAGS=\"-std=c++14\" python setup.py bdist_wheel
+    pip install dist/*.whl
+
+    rm -rf inference_results_v4.0
+    git clone https://github.com/mlcommons/inference_results_v4.0.git
+    cd inference_results_v4.0 && git checkout ceef1ea
+
+    if [ \"$data_type\" = \"fp32\" ]; then
+        git apply $HOME/mlperf_patches/arm_fp32.patch
+    else
+        git apply $HOME/mlperf_patches/arm_int8.patch
+    fi
+"
+
+echo -e "${yellow}Checking for dumped FP32 model...${reset}"
 dumped_fp32_model="dlrm-multihot-pytorch.pt"
 int8_model="aarch64_dlrm_int8.pt"
 dlrm_test_path="$HOME/inference_results_v4.0/closed/Intel/code/dlrm-v2-99.9/pytorch-cpu-int8"
 
-# Check if FP32 model is already dumped
-if [ -f "$HOME/model/$dumped_fp32_model" ]; then
-    echo -e "${yellow}File '$dumped_fp32_model' exists. Skipping model dumping step.${reset}"
-else
-    echo -e "${yellow}File '$dumped_fp32_model' does not exist. Dumping the model weights...${reset}"
-    docker cp $HOME/dlrm_docker_setup/requirements.txt $dlrm_container:$HOME
-    docker exec -it "$dlrm_container" bash -c "pip install -r requirements.txt ; cd $dlrm_test_path && python python/dump_torch_model.py --model-path=$model_dir/model_weights --dataset-path=$data_dir"
+if [ ! -f "$HOME/model/$dumped_fp32_model" ]; then
+    echo -e "${yellow}Dumping model weights...${reset}"
+    docker exec -it "$dlrm_container" bash -c "
+        pip install -r --extra-index-url https://download.pytorch.org/whl/nightly/cpu tensordict==0.1.2 torchsnapshot==0.1.0 fbgemm_gpu==2025.1.22+cpu torchrec==1.1.0.dev20250127+cpu
+    "
+    docker exec -it "$dlrm_container" bash -c "
+        cd $dlrm_test_path && python python/dump_torch_model.py --model-path=$model_dir/model_weights --dataset-path=$data_dir
+    "
 fi
 
-###### Calibrate the model #######
-# In the case of INT8, calibrate the model if not already calibrated.
 echo -e "${yellow}Checking if INT8 model calibration is required...${reset}"
-
 if [ "$data_type" == "int8" ] && [ ! -f "$HOME/model/$int8_model" ]; then
-    echo -e "${yellow}File '$int8_model' does not exist. Running calibration...${reset}"
-    # the calibration will create aarch64_dlrm_int8.pt in the $HOME/model directory.
+    echo -e "${yellow}Running INT8 calibration...${reset}"
     docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && ./run_calibration.sh"
-else
-    echo -e "${yellow}Calibration step is not needed.${reset}"
 fi
 
-###### Run the test #######
-# Run the offline test
 echo -e "${yellow}Running offline test...${reset}"
 docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && bash run_main.sh offline $data_type"
 
-# Copy results to the host machine
 echo -e "${yellow}Copying results to host...${reset}"
 docker exec -it "$dlrm_container" bash -c "cd $dlrm_test_path && cp -r output/pytorch-cpu/dlrm/Offline/performance/run_1/* $results_dir/$data_type/"
 
-# Display the MLPerf summary results
-echo -e "${yellow}Displaying MLPerf results...${reset}"
 cat $results_dir/$data_type/mlperf_log_summary.txt
-
 ```
 
-At a glance, these are the steps it goes through:
-
-- Sets up MLPerf repositories within the container.
-- Dumps the model from existing model weights if not already available.
-- Calibrates the INT8 model from the dumped model if it has not been previously generated.
-- Executes the offline benchmark test, generating terabyte-scale binary data files during the process.
-
-Run the offline test with the `int8` datatype. You can also specify the argument `fp32` to build for the floating point datatype.
+With the script ready, it's time to run the benchmark:
 
 ```bash
-cd $HOME/dlrm_docker_setup
-./run_dlrm_benchmark.sh int8
-```
-
-## Save output files
-
-You may want to save the final model and data files to run on smaller servers. You can use `scp` to achieve this.
-
-From your long-term storage machine, run the following command. You need to update the parameters before running.
-
-```
-scp -i <key-pair> <username>@<ipaddress>:/remote/path/to/file $HOME/model/int8/
-```
-where `key-pair` is the key-pair used for the larger instance, `username` and `ipaddress` the corresponding access points, and the two paths are the source and destination paths respectively.
-
-Save the following files for long-term storage.
-
-```console
-$HOME/model/aarch64_dlrm_int8.pt
-$HOME/model/dlrm-multihot-pytorch.pt
-$HOME/data/terabyte_processed_test_v2_dense.bin
-$HOME/data/terabyte_processed_test_v2_label_sparse.bin
-```
-
-To run the INT8 model, an instance with 250 GB of RAM and 500 GB of disk space is enough. For example, the following instance types:
-
-|         CSP           |  Instance type |
-| --------------------- | -------------- |
-| Google Cloud Platform | c4a-highmem-32 |
-| Amazon Web Services   | r8g.8xlarge    |
-| Microsoft Azure       | TODO           |
-
-For example, you can re-run the offline `int8` benchmark by cloning the repository to the smaller instance and the following command.
-
-```bash
-./run_main.sh offline int8
+./run_dlrm_benchmark.sh
 ```
 
 ## Understanding the results
 
 As a final step, have a look at the results generated in a text file.
+
+The DLRM model optimizes the Click-Through Rate (CTR) prediction. It is a fundamental task in online advertising, recommendation systems, and search engines. Essentially, the model estimates the probability that a user will click on a given ad, product recommendation, or search result. The higher the predicted probability, the more likely the item is to be clicked. In a server context, the goal is to observe a high through-put of these probabilities.
+
 ```bash
 cat $HOME/results/int8/mlperf_log_summary.txt
 ```
 
-It should look something like this. Note the ....
+Your output should contain a `Samples per second`, where each sample tells probability of the user clicking a certain ad.
 
 ```output
 ================================================
@@ -172,7 +150,7 @@ MLPerf Results Summary
 SUT name : PyFastSUT
 Scenario : Offline
 Mode     : PerformanceOnly
-Samples per second: 1434.8 # Each sample tells probability of the user clicking a certain ad. Can be used by Amazon to pick the top 5 ads to recommend to a user
+Samples per second: 1434.8
 Result is : VALID
   Min duration satisfied : Yes
   Min queries satisfied : Yes
@@ -214,4 +192,3 @@ performance_issue_same : 0
 performance_issue_same_index : 0
 performance_sample_count : 204800
 ```
-
@@ -3,11 +3,10 @@ title: MLPerf Benchmarking on Arm Neoverse V2
 
 minutes_to_complete: 10
 
-who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will . Then, you’ll build and run the benchmark using MLPerf, analyzing key performance metrics along the way.
-
+who_is_this_for: This is an introductory topic for software developers who want to set up a pipeline in the cloud for recommendation models. You will build and run the benchmark using MLPerf and PyTorch.
 
 learning_objectives:
-    - build the Deep Learning Recommendation Model (DLRM)
+    - build the Deep Learning Recommendation Model (DLRM) using a Docker image
     - run a modified performant DLRMv2 benchmark and inspect the results
 
 prerequisites:
@@ -22,9 +21,10 @@ armips:
     - Neoverse
 tools_software_languages:
     - Docker
-    - TODO
+    - MLPerf
 operatingsystems:
     - Linux
+cloud_service_providers: AWS
 
 further_reading:
     - resource: