Add HPL benchmark for H4D clusters. by bk202 · Pull Request #5377 · GoogleCloudPlatform/cluster-toolkit

bk202 · 2026-03-20T21:52:30Z

Add automation scripts for HPL benchmarking on H4D Slurm clusters

This change introduces a suite of scripts designed to build, run, and analyze High-Performance Linpack (HPL) workloads on AMD H4D compute nodes.

Key components included:

run-hpl-workload.sh: A core orchestrator script that automates a three-job Slurm pipeline (Orchestrator -> Workload -> Analyzer). It handles dynamic generation of Ramble configurations, isolates the Spack environment, compiles the HPL binary natively on a compute node, and submits the optimized HPL benchmark across the specified number of nodes. It natively supports tuning for RDMA hardware (via rxm) and debugging (via tcp).
install-hpl-dependencies.sh: A helper script that uses srun to parallelize the installation and compilation of Spack, Ramble, GCC 14, Intel MPI, and HPL across all available compute nodes.
README.md: Comprehensive instructions on pipeline phases, execution flags, and baseline performance expectations (Gflops) across various node scales.

gemini-code-assist · 2026-03-20T21:52:48Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust set of automation scripts designed to facilitate High-Performance Linpack (HPL) benchmarking on AMD H4D Slurm clusters. The changes enable users to easily deploy, execute, and analyze HPL workloads, ensuring consistent performance verification and efficient utilization of interconnect hardware.

Highlights

HPL Workload Orchestration: Introduced run-hpl-workload.sh, a core orchestrator script that automates a three-job Slurm pipeline (Orchestrator -> Workload -> Analyzer) for HPL benchmarking. It handles dynamic Ramble configuration, Spack environment isolation, native HPL binary compilation on compute nodes, and optimized HPL benchmark submission, supporting RDMA (rxm) and TCP for network tuning.
Dependency Installation Automation: Added install-hpl-dependencies.sh, a helper script that leverages srun to parallelize the installation and compilation of essential dependencies, including Spack, Ramble, GCC 14, Intel MPI, and HPL, across all available compute nodes.
Comprehensive Documentation: Included a README.md file that provides comprehensive instructions on the HPL pipeline phases, execution flags, and expected baseline performance (Gflops) across various node scales, ensuring ease of use and understanding.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces automation scripts for HPL benchmarking on H4D Slurm clusters. The changes are well-structured and include comprehensive documentation and scripts for dependency installation and workload execution. My review focuses on improving documentation clarity, ensuring adherence to the repository's style guide for copyright years, and addressing a security concern related to file permissions in the installation script, aligning with established repository rules.

gemini-code-assist · 2026-03-20T21:56:41Z

examples/hpc-slurm-h4d/system_benchmarks/install-hpl-dependencies.sh

+mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
+chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}


Setting world-writable permissions (777) on directories in /opt introduces a security risk, even if temporary. Since the script is executed with sudo, it already has sufficient privileges to create and write to these directories. Please remove the chmod 777 command to avoid this vulnerability. The directories will be created with secure default permissions for the root user.

Suggested change

mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}

chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}

mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}

References

Creating directories with world-writable permissions (777) using sudo is a security risk, as it allows unauthorized modification. Directories should be created with restrictive permissions.

gemini-code-assist · 2026-03-20T21:56:41Z

examples/hpc-slurm-h4d/system_benchmarks/README.md

+| **RXM**  | `rxm`    | Uses RDMA via `ofi_rxm` |
+:          :          : for highest GFLOPS and best latency.                 :
+| **TCP**  | `tcp`    | (Default) Standard TCP/IP sockets. Useful for        |
+:          :          : debugging; lowest performance.                       :


The Markdown table formatting for the providers appears to be broken. The current syntax, which splits descriptions across multiple lines using colons, is not standard and may not render correctly. Please consolidate the description for each provider onto a single line to ensure proper rendering.

Suggested change

| **RXM** | `rxm` | Uses RDMA via `ofi_rxm` |

: : : for highest GFLOPS and best latency. :

| **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for |

: : : debugging; lowest performance. :

| **RXM** | `rxm` | Uses RDMA via `ofi_rxm` for highest GFLOPS and best latency. |

| **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for debugging; lowest performance. |

References

The repository requires consistent formatting, and this comment addresses a deviation in Markdown table rendering.

gemini-code-assist · 2026-03-20T21:56:41Z

examples/hpc-slurm-h4d/system_benchmarks/install-hpl-dependencies.sh

@@ -0,0 +1,90 @@
+#!/bin/bash
+# Copyright 2025 Google LLC


The copyright year should be updated to 2026 to align with the repository's style guide.

Suggested change

# Copyright 2025 Google LLC

# Copyright 2026 Google LLC

References

The style guide specifies that the current year for copyright headers is 2026. ^(link)

gemini-code-assist · 2026-03-20T21:56:41Z

examples/hpc-slurm-h4d/system_benchmarks/install-hpl-dependencies.sh

+     --output="install_progress_%N.log" \
+     sudo "$INSTALL_SCRIPT"
+
+echo "Installation complete across all active compute nodes!"


The file is missing a newline character at the end. It's a best practice to end all text files with a newline.

Suggested change

echo "Installation complete across all active compute nodes!"

echo "Installation complete across all active compute nodes!"

gemini-code-assist · 2026-03-20T21:56:42Z

examples/hpc-slurm-h4d/system_benchmarks/run-hpl-workload.sh

@@ -0,0 +1,295 @@
+#!/bin/bash
+# Copyright 2025 Google LLC


The copyright year should be updated to 2026 to align with the repository's style guide.

Suggested change

# Copyright 2025 Google LLC

# Copyright 2026 Google LLC

References

The style guide specifies that the current year for copyright headers is 2026. ^(link)

aslam-quad · 2026-03-23T03:50:35Z

/gcbrun

Add HPL benchmark for H4D clusters.

9197b1b

bk202 requested review from a team and samskillman as code owners March 20, 2026 21:52

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

Remove mentioning of HCS and rocky/ubuntu related OS notes in readme.md

7b9c9f9

aslam-quad changed the base branch from main to develop March 23, 2026 03:50

aslam-quad added the external PR from external contributor label Mar 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add HPL benchmark for H4D clusters.#5377

Add HPL benchmark for H4D clusters.#5377
bk202 wants to merge 2 commits intoGoogleCloudPlatform:developfrom
bk202:liujoh-dev

bk202 commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

gemini-code-assist bot Mar 20, 2026

Uh oh!

aslam-quad commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}
		chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR}

	echo "Installation complete across all active compute nodes!"
	echo "Installation complete across all active compute nodes!"

Conversation

bk202 commented Mar 20, 2026

Uh oh!

gemini-code-assist bot commented Mar 20, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

aslam-quad commented Mar 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants