- 
                Notifications
    You must be signed in to change notification settings 
- Fork 121
Continuous Benchmarking #936
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| PR Reviewer Guide 🔍Here are some key observations to aid the review process: 
 | 
| PR Code Suggestions ✨Explore these optional code suggestions: 
 | ||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces continuous benchmarking for performance-critical workflows, including a new test script, documentation updates, sample benchmark data, and a dedicated GitHub Actions workflow. It also removes existing CI workflows in favor of a single continuous benchmarking pipeline.
- Added test-components.shto validate component execution and JSON conversions.
- Generated documentation docs/documentation/cont-bench.mdwith sample benchmark results.
- Introduced bench.yaml/json,bench-google.json, and.github/workflows/cont-bench.ymlto automate benchmark collection and Google Benchmark conversion.
- Note: an .envfile with sensitive tokens was added and multiple legacy workflows were removed.
Reviewed Changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description | 
|---|---|
| test-components.sh | Script for testing component execution and JSON YAML | 
| docs/documentation/cont-bench.md | Documentation page for continuous benchmarking | 
| bench.yaml / bench.json / bench-google.json | Sample benchmark data | 
| .github/workflows/cont-bench.yml | New continuous benchmarking workflow | 
| .env | Environment file with sensitive tokens | 
| .github/workflows/*.yml (other CI workflows) | Legacy workflows deleted | 
Comments suppressed due to low confidence (4)
.github/workflows/cont-bench.yml:64
- The Python script reads 'bench.json' in the root, but the workflow converts and writes to 'pr/bench.json'. Update the path to match where the file is written.
          with open('bench.json', 'r') as f:
bench.yaml:5
- Avoid hard-coded absolute paths; use relative paths or environment variables to improve portability.
      path: /home/mohammed/Desktop/cont-bench/benchmarks/5eq_rk3_weno3_hllc/case.py
.env:1
- The .env file commits personal access tokens, exposing secrets. Remove it from version control and use GitHub Secrets instead.
TOKEN=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
| Posted my GitHub account tokens, I will just disable them rn. | 
| You can put back your workflow files. You will want to mimic the setup of the new  | 
| Status Update: Appended  Note to self: {
"context": {
"date": "2015/03/17-18:40:25",   # inferred on the CI via bash command
"num_cpus": 40,  # fixed number to desired CPU cores
"mhz_per_cpu": 2801, # 
"cpu_scaling_enabled": false, # false as default
"build_type": "debug" # release
},
"benchmarks": [
{
"name": "5eq_rk3_weno3_hllc",
"iterations": 94877, # fixed number based on time steps
"real_time": 29275, # simulation grind time
"cpu_time": 29836, # same as abve
"bytes_per_second": 134066, # still need to figure out
"items_per_second": 33516 # still need to figure out
},
{
"name": "hypo_hll",
"iterations": 21609,
"real_time": 32317,
"cpu_time": 32429,
"bytes_per_second": 986770,
"items_per_second": 246693
},
.....Experiment in case of re-naming or adding more cases, and figure out what extra details to be dumped into a  v5.0.6-gpu.yaml | 
| Status Update: v4.8.3-v4.9.3 GPU corrupt binaries - #543 fixes hypo with a single OpenACC directive. Nothing in v4.9.4 to fix them.  9 0x0000000000029d90 __libc_init_first()  ???:0
10 0x0000000000029e40 __libc_start_main()  ???:0
11 0x0000000000404ae5 _start()  ???:0
=================================
[atl1-1-03-007-29-0:2610296] *** Process received signal ***
[atl1-1-03-007-29-0:2610296] Signal: Segmentation fault (11)
[atl1-1-03-007-29-0:2610296] Signal code:  (-6)
[atl1-1-03-007-29-0:2610296] Failing at address: 0x27d478
[atl1-1-03-007-29-0:2610296] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x155545842520]
[atl1-1-03-007-29-0:2610296] [ 1] /opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/lib/libmpi.so.40(ompi_file_close+0x18)[0x1555490457d8]
[atl1-1-03-007-29-0:2610296] [ 2] /opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/lib/libmpi.so.40(PMPI_File_close+0x16)[0x1555490665b6]
[atl1-1-03-007-29-0:2610296] [ 3] /opt/nvidia/hpc_sdk/Linux_x86_64/23.11/comm_libs/12.3/hpcx/hpcx-2.16/ompi/lib/libmpi_mpifh.so.40(PMPI_FILE_CLOSE+0x1f)[0x15554944693f]
[atl1-1-03-007-29-0:2610296] [ 4] /opt/MFC/build/install/33175f95af/bin/simulation[0x4865a9]
[atl1-1-03-007-29-0:2610296] [ 5] /opt/MFC/build/install/33175f95af/bin/simulation[0x5bf44e]
[atl1-1-03-007-29-0:2610296] [ 6] /opt/MFC/build/install/33175f95af/bin/simulation[0x61fd74]
[atl1-1-03-007-29-0:2610296] [ 7] /opt/MFC/build/install/33175f95af/bin/simulation[0x404bf1]
[atl1-1-03-007-29-0:2610296] [ 8] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x155545829d90]
[atl1-1-03-007-29-0:2610296] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x155545829e40]
[atl1-1-03-007-29-0:2610296] [10] /opt/MFC/build/install/33175f95af/bin/simulation[0x404ae5]
[atl1-1-03-007-29-0:2610296] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 2610297 on node atl1-1-03-007-29-0 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
        LANGUAGE = (unset),
        LC_ALL = (unset),
        LC_CTYPE = "C.UTF-8",
        LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
mfc: ERROR > :( /opt/MFC/build/install/33175f95af/bin/simulation failed with exit code 139.
 
Error: Submitting batch file for Interactive failed. It can be found here: /opt/MFC/benchmarks/5eq_rk3_weno3_hllc/MFC.sh. Please check the file for errors.
Terminated
mfc: ERROR > mfc.py finished with a 143 exit code.
mfc: (venv) Exiting the Python virtual environment.
[malmahrouqi3@login-phoenix-gnr-1 afb4]$ 
Incorrect build config, or something of that nature leading to halt of sim runs midway through. POC: https://malmahrouqi3.github.io/documentation/md_expectedPerformance.html | 
|  <iframe src="bench/index.html" width="1000" height="1000" style="border:none;"></iframe>  | 
User description
Description
Closes #462
Concerning (#462),
Intended to keep track of benchmark results (
./mfc.sh bench) for performance-critical improvements. Since there is not a specific benchmark procedure, the four existing MFC benchmark cases' results are reported. To ensure standardized performance with no hardware-bias, all benchmarking occurs on a GitHub runner till figured later what resources/clusters/allocations/runners to utilize. Once poc is finalized, other stuff ought to be easy.Debugging info,
Not much besides reviewing .md pages.
To-dos,
Note to Self:
Look into retrospectively record the previous 10-50 base repo commits to display invaluable datapoints.
PR Type
Enhancement
Description
Implement continuous benchmarking with GitHub Action workflow
Remove legacy cluster-specific benchmark scripts
Add Google Benchmark format conversion for MFC results
Create automated performance tracking and documentation
Changes diagram
Changes walkthrough 📝
21 files
Add continuous benchmarking GitHub Action workflowRemove legacy benchmark workflowRemove Frontier cluster benchmark scriptRemove Frontier cluster build scriptRemove Frontier cluster benchmark submission scriptRemove Frontier cluster submission scriptRemove Frontier cluster test scriptRemove Phoenix cluster benchmark scriptRemove Phoenix cluster benchmark submission scriptRemove Phoenix cluster submission scriptRemove Phoenix cluster test scriptRemove code cleanliness workflowRemove coverage check workflowRemove documentation workflowRemove formatting workflowRemove line count workflowRemove source linting workflowRemove toolchain linting workflowRemove PMD source analysis workflowRemove spell check workflowRemove test suite workflow5 files
Add component testing script for benchmarksAdd benchmark results YAML fileAdd Google Benchmark format JSON resultsAdd benchmark results JSON fileAdd benchmark results YAML file1 files
Add environment configuration with GitHub tokens1 files
Add continuous benchmarking documentation