Add HPL benchmark for H4D clusters.#5377
Add HPL benchmark for H4D clusters.#5377bk202 wants to merge 2 commits intoGoogleCloudPlatform:developfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust set of automation scripts designed to facilitate High-Performance Linpack (HPL) benchmarking on AMD H4D Slurm clusters. The changes enable users to easily deploy, execute, and analyze HPL workloads, ensuring consistent performance verification and efficient utilization of interconnect hardware. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces automation scripts for HPL benchmarking on H4D Slurm clusters. The changes are well-structured and include comprehensive documentation and scripts for dependency installation and workload execution. My review focuses on improving documentation clarity, ensuring adherence to the repository's style guide for copyright years, and addressing a security concern related to file permissions in the installation script, aligning with established repository rules.
| mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR} | ||
| chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR} |
There was a problem hiding this comment.
Setting world-writable permissions (777) on directories in /opt introduces a security risk, even if temporary. Since the script is executed with sudo, it already has sufficient privileges to create and write to these directories. Please remove the chmod 777 command to avoid this vulnerability. The directories will be created with secure default permissions for the root user.
| mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR} | |
| chmod 777 ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR} | |
| mkdir -p ${INSTALL_DIR}/spack ${INSTALL_DIR}/ramble ${SOURCE_MIRROR_DIR} |
References
- Creating directories with world-writable permissions (777) using
sudois a security risk, as it allows unauthorized modification. Directories should be created with restrictive permissions.
| | **RXM** | `rxm` | Uses RDMA via `ofi_rxm` | | ||
| : : : for highest GFLOPS and best latency. : | ||
| | **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for | | ||
| : : : debugging; lowest performance. : |
There was a problem hiding this comment.
The Markdown table formatting for the providers appears to be broken. The current syntax, which splits descriptions across multiple lines using colons, is not standard and may not render correctly. Please consolidate the description for each provider onto a single line to ensure proper rendering.
| | **RXM** | `rxm` | Uses RDMA via `ofi_rxm` | | |
| : : : for highest GFLOPS and best latency. : | |
| | **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for | | |
| : : : debugging; lowest performance. : | |
| | **RXM** | `rxm` | Uses RDMA via `ofi_rxm` for highest GFLOPS and best latency. | | |
| | **TCP** | `tcp` | (Default) Standard TCP/IP sockets. Useful for debugging; lowest performance. | |
References
- The repository requires consistent formatting, and this comment addresses a deviation in Markdown table rendering.
| @@ -0,0 +1,90 @@ | |||
| #!/bin/bash | |||
| # Copyright 2025 Google LLC | |||
There was a problem hiding this comment.
The copyright year should be updated to 2026 to align with the repository's style guide.
| # Copyright 2025 Google LLC | |
| # Copyright 2026 Google LLC |
References
- The style guide specifies that the current year for copyright headers is 2026. (link)
| --output="install_progress_%N.log" \ | ||
| sudo "$INSTALL_SCRIPT" | ||
|
|
||
| echo "Installation complete across all active compute nodes!" No newline at end of file |
| @@ -0,0 +1,295 @@ | |||
| #!/bin/bash | |||
| # Copyright 2025 Google LLC | |||
There was a problem hiding this comment.
The copyright year should be updated to 2026 to align with the repository's style guide.
| # Copyright 2025 Google LLC | |
| # Copyright 2026 Google LLC |
References
- The style guide specifies that the current year for copyright headers is 2026. (link)
|
/gcbrun |
Add automation scripts for HPL benchmarking on H4D Slurm clusters
This change introduces a suite of scripts designed to build, run, and analyze High-Performance Linpack (HPL) workloads on AMD H4D compute nodes.
Key components included:
run-hpl-workload.sh: A core orchestrator script that automates a three-job Slurm pipeline (Orchestrator -> Workload -> Analyzer). It handles dynamic generation of Ramble configurations, isolates the Spack environment, compiles the HPL binary natively on a compute node, and submits the optimized HPL benchmark across the specified number of nodes. It natively supports tuning for RDMA hardware (viarxm) and debugging (viatcp).install-hpl-dependencies.sh: A helper script that usessrunto parallelize the installation and compilation of Spack, Ramble, GCC 14, Intel MPI, and HPL across all available compute nodes.README.md: Comprehensive instructions on pipeline phases, execution flags, and baseline performance expectations (Gflops) across various node scales.