Skip to content

Add HPL benchmark for H4D clusters.#5377

Open
bk202 wants to merge 2 commits intoGoogleCloudPlatform:developfrom
bk202:liujoh-dev
Open

Add HPL benchmark for H4D clusters.#5377
bk202 wants to merge 2 commits intoGoogleCloudPlatform:developfrom
bk202:liujoh-dev

Conversation

@bk202
Copy link

@bk202 bk202 commented Mar 20, 2026

Add automation scripts for HPL benchmarking on H4D Slurm clusters

This change introduces a suite of scripts designed to build, run, and analyze High-Performance Linpack (HPL) workloads on AMD H4D compute nodes.

Key components included:

  • run-hpl-workload.sh: A core orchestrator script that automates a three-job Slurm pipeline (Orchestrator -> Workload -> Analyzer). It handles dynamic generation of Ramble configurations, isolates the Spack environment, compiles the HPL binary natively on a compute node, and submits the optimized HPL benchmark across the specified number of nodes. It natively supports tuning for RDMA hardware (via rxm) and debugging (via tcp).
  • install-hpl-dependencies.sh: A helper script that uses srun to parallelize the installation and compilation of Spack, Ramble, GCC 14, Intel MPI, and HPL across all available compute nodes.
  • README.md: Comprehensive instructions on pipeline phases, execution flags, and baseline performance expectations (Gflops) across various node scales.

@bk202 bk202 requested review from a team and samskillman as code owners March 20, 2026 21:52
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust set of automation scripts designed to facilitate High-Performance Linpack (HPL) benchmarking on AMD H4D Slurm clusters. The changes enable users to easily deploy, execute, and analyze HPL workloads, ensuring consistent performance verification and efficient utilization of interconnect hardware.

Highlights

  • HPL Workload Orchestration: Introduced run-hpl-workload.sh, a core orchestrator script that automates a three-job Slurm pipeline (Orchestrator -> Workload -> Analyzer) for HPL benchmarking. It handles dynamic Ramble configuration, Spack environment isolation, native HPL binary compilation on compute nodes, and optimized HPL benchmark submission, supporting RDMA (rxm) and TCP for network tuning.
  • Dependency Installation Automation: Added install-hpl-dependencies.sh, a helper script that leverages srun to parallelize the installation and compilation of essential dependencies, including Spack, Ramble, GCC 14, Intel MPI, and HPL, across all available compute nodes.
  • Comprehensive Documentation: Included a README.md file that provides comprehensive instructions on the HPL pipeline phases, execution flags, and expected baseline performance (Gflops) across various node scales, ensuring ease of use and understanding.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces automation scripts for HPL benchmarking on H4D Slurm clusters. The changes are well-structured and include comprehensive documentation and scripts for dependency installation and workload execution. My review focuses on improving documentation clarity, ensuring adherence to the repository's style guide for copyright years, and addressing a security concern related to file permissions in the installation script, aligning with established repository rules.

Comment on lines +20 to +21
mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Setting world-writable permissions (777) on directories in /opt introduces a security risk, even if temporary. Since the script is executed with sudo, it already has sufficient privileges to create and write to these directories. Please remove the chmod 777 command to avoid this vulnerability. The directories will be created with secure default permissions for the root user.

Suggested change
mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
References
  1. Creating directories with world-writable permissions (777) using sudo is a security risk, as it allows unauthorized modification. Directories should be created with restrictive permissions.

Comment on lines +65 to +68
| **RXM** | `rxm` | Uses RDMA via `ofi_rxm` |
: : : for highest GFLOPS and best latency. :
| **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for |
: : : debugging; lowest performance. :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Markdown table formatting for the providers appears to be broken. The current syntax, which splits descriptions across multiple lines using colons, is not standard and may not render correctly. Please consolidate the description for each provider onto a single line to ensure proper rendering.

Suggested change
| **RXM** | `rxm` | Uses RDMA via `ofi_rxm` |
: : : for highest GFLOPS and best latency. :
| **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for |
: : : debugging; lowest performance. :
| **RXM** | `rxm` | Uses RDMA via `ofi_rxm` for highest GFLOPS and best latency. |
| **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for debugging; lowest performance. |
References
  1. The repository requires consistent formatting, and this comment addresses a deviation in Markdown table rendering.

@@ -0,0 +1,90 @@
#!/bin/bash
# Copyright 2025 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The copyright year should be updated to 2026 to align with the repository's style guide.

Suggested change
# Copyright 2025 Google LLC
# Copyright 2026 Google LLC
References
  1. The style guide specifies that the current year for copyright headers is 2026. (link)

--output="install_progress_%N.log" \
sudo "$INSTALL_SCRIPT"

echo "Installation complete across all active compute nodes!" No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The file is missing a newline character at the end. It's a best practice to end all text files with a newline.

Suggested change
echo "Installation complete across all active compute nodes!"
echo "Installation complete across all active compute nodes!"

@@ -0,0 +1,295 @@
#!/bin/bash
# Copyright 2025 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The copyright year should be updated to 2026 to align with the repository's style guide.

Suggested change
# Copyright 2025 Google LLC
# Copyright 2026 Google LLC
References
  1. The style guide specifies that the current year for copyright headers is 2026. (link)

@aslam-quad aslam-quad changed the base branch from main to develop March 23, 2026 03:50
@aslam-quad
Copy link
Contributor

/gcbrun

@aslam-quad aslam-quad added the external PR from external contributor label Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external PR from external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants