-
Notifications
You must be signed in to change notification settings - Fork 125
Add tuo modules and template #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThe pull request adds support for a new LLNL system named Tuolumne (abbreviated as 't') to the toolchain infrastructure. This includes adding the system to the bootstrap module selection interface, configuring associated modules with CPU and GPU variants, and introducing a new template for batch job submission via Flux. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~18 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
| # flux: --exclusive | ||
| # flux:--setattr=thp=always | ||
| # flux: --coral2-hugepages=512GB | ||
| % if account: | ||
| # flux: --bank=${account} | ||
| % endif | ||
| % if partition: | ||
| # flux: --queue=${partition} | ||
| % endif | ||
| % if unified: | ||
| # flux:--setattr=thp=always | ||
| # flux: --coral2-hugepages=512GB | ||
| % endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Fix a typo in the # flux:--setattr directive by adding a space. Also, remove the duplicated setattr and coral2-hugepages directives, keeping them only within the if unified: block. [possible issue, importance: 8]
| # flux: --exclusive | |
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % if account: | |
| # flux: --bank=${account} | |
| % endif | |
| % if partition: | |
| # flux: --queue=${partition} | |
| % endif | |
| % if unified: | |
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % endif | |
| # flux: --exclusive | |
| % if account: | |
| # flux: --bank=${account} | |
| % endif | |
| % if partition: | |
| # flux: --queue=${partition} | |
| % endif | |
| % if unified: | |
| # flux: --setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % endif |
| f-all cray-fftw cray-hdf5 python cmake | ||
| f-gpu python craype-accel-amd-gfx90a rocprofiler-compute/3.0.0 | ||
|
|
||
| t OLCF Tuolumne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Correct the inconsistent facility label for the 'Tuolumne' system. It is labeled OLCF in toolchain/modules but LLNL in toolchain/bootstrap/modules.sh; they should be consistent. [general, importance: 7]
| t OLCF Tuolumne | |
| t LLNL Tuolumne |
|
This needs some work. For some reason I get a memory access fault at runtime now. |
| t OLCF Tuolumne | ||
| t-all cpe/25.03 rocm/6.3.1 | ||
| t-all cray-fftw cray-hdf5 cray-python cmake | ||
| t-gpu craype-accel-amd-gfx942 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Typo in module name: craype-accel-amd-gfx942 is almost certainly misspelled and will not match the real module name (typically craype-accel-amd-gfx90a), causing module load failures on Tuolumne GPU nodes. [possible bug]
Severity Level: Critical 🚨
| t-gpu craype-accel-amd-gfx942 | |
| t-gpu craype-accel-amd-gfx90a |
Why it matters? ⭐
This is a real, likely-typo bug. Other entries (e.g. f-gpu) use craype-accel-amd-gfx90a which matches known Cray module names; gfx942 does not follow that pattern and will likely fail to load on Tuolumne GPU nodes. Replacing with gfx90a aligns with the rest of the file and fixes a concrete runtime failure.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/modules
**Line:** 51:51
**Comment:**
*Possible Bug: Typo in module name: `craype-accel-amd-gfx942` is almost certainly misspelled and will not match the real module name (typically `craype-accel-amd-gfx90a`), causing module load failures on Tuolumne GPU nodes.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # flux: --error="${name}.err" | ||
| # flux: --time=${walltime} | ||
| # flux: --exclusive | ||
| # flux:--setattr=thp=always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...". [possible bug]
Severity Level: Critical 🚨
| # flux:--setattr=thp=always | |
| # flux: --setattr=thp=always |
Why it matters? ⭐
Flux job-file directives are typically written as "# flux: --option=..." (a space after the colon).
Without the space the scheduler parser may not recognize the line as a directive, so adding the space fixes a probable real parsing bug rather than being purely cosmetic.
The PR already contains this exact line at L13; changing it to "# flux: --setattr=thp=always" is the correct single-line fix.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/templates/tuo.mako
**Line:** 13:13
**Comment:**
*Possible Bug: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...".
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.| # flux: --queue=${partition} | ||
| % endif | ||
| % if unified: | ||
| # flux:--setattr=thp=always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Same formatting issue inside the unified block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed. [possible bug]
Severity Level: Critical 🚨
| # flux:--setattr=thp=always | |
| # flux: --setattr=thp=always |
Why it matters? ⭐
Same issue as the previous suggestion but in the unified block (L21-L24). The missing space can prevent Flux from parsing the directive.
Fixing it to "# flux: --setattr=thp=always" is correct and necessary for reliable scheduler behavior when the unified block is rendered.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/templates/tuo.mako
**Line:** 22:22
**Comment:**
*Possible Bug: Same formatting issue inside the `unified` block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No issues found across 3 files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for running MFC on LLNL's Tuolumne system by introducing a new batch template (tuo.mako) and corresponding module configurations. Tuolumne is configured similarly to ORNL's Frontier, using Cray Programming Environment with ROCm for AMD GPU support (specifically gfx942 architecture).
Key changes:
- New Flux-based batch template for Tuolumne with GPU/CPU mode support
- Module configuration for Tuolumne using CPE 25.03, ROCm 6.3.1, and Cray libraries
- Integration of Tuolumne into the bootstrap system selection menu
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
toolchain/templates/tuo.mako |
New Flux batch system template for Tuolumne with MPICH GPU support and flux job submission directives |
toolchain/modules |
Adds Tuolumne ('t') module definitions with Cray environment, ROCm, and AMD gfx942 GPU accelerator support |
toolchain/bootstrap/modules.sh |
Integrates Tuolumne into system selection menu under LLNL and improves alignment for Brown Oscar entry |
| ok ":) Loading modules:\n" | ||
| cd "${MFC_ROOT_DIR}" | ||
| % if engine == 'batch': | ||
| . ./mfc.sh load -c t -m ${'g' if gpu else 'c'} |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check in module loading command. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu will be True even when gpu == 'no' because 'no' is a truthy string. This should be ${'g' if gpu != 'no' else 'c'} to match the pattern used in other templates like frontier.mako (line 37).
| . ./mfc.sh load -c t -m ${'g' if gpu else 'c'} | |
| . ./mfc.sh load -c t -m ${'g' if gpu != 'no' else 'c'} |
| # flux: --error="${name}.err" | ||
| # flux: --time=${walltime} | ||
| # flux: --exclusive | ||
| # flux:--setattr=thp=always |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after flux: in the batch directive. Should be # flux: --setattr=thp=always for consistency with other flux directives in this file.
| # flux: --queue=${partition} | ||
| % endif | ||
| % if unified: | ||
| # flux:--setattr=thp=always |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after flux: in the batch directive. Should be # flux: --setattr=thp=always for consistency with other flux directives in this file.
| # flux:--setattr=thp=always | |
| # flux: --setattr=thp=always |
| % if gpu: | ||
| export MPICH_GPU_SUPPORT_ENABLED=1 | ||
| % else: | ||
| export MPICH_GPU_SUPPORT_ENABLED=0 |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 42).
| % if gpu: | ||
| --gpus-per-task 1 \ |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 69).
| # flux:--setattr=thp=always | ||
| # flux: --coral2-hugepages=512GB |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate directives: Lines 13-14 are repeated unconditionally here and then again conditionally at lines 22-23 when unified is true. If these directives are always needed, they should only appear once (lines 13-14). If they should only apply when unified is true, move them inside the if unified: block and remove lines 13-14. The current code will result in these directives appearing twice when unified is enabled.
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
toolchain/bootstrap/modules.sh (1)
42-48: Tuolumne is correctly wired into the selector; prompt string could be simplified (optional).The new LLNL/Tuolumne entry and inclusion of
(t)in the selection prompt look consistent with existing systems and will flow correctly into theu_cslug logic.The
log_nline’s quoting pattern is quite dense and is what shellcheck flags (SC2027). It’s functionally fine and matches the surrounding style, but if you care about shellcheck noise, consider simplifying it to use a single quoted string with${...}expansions (or adding a local# shellcheck disable=SC2027for that line).toolchain/templates/tuo.mako (2)
5-25: Avoid duplicating THP/hugepage Flux directives whenunifiedis true.In the Flux header,
--setattr=thp=alwaysand--coral2-hugepages=512GBare emitted unconditionally (lines 13–14) and again inside theif unifiedblock (lines 22–23). Whenunifiedis true this results in duplicate directives, which is probably harmless but redundant and a bit confusing.Consider keeping these options either only unconditionally or only inside the
unifiedblock, and also normalizing the spacing to# flux: --...for consistency with other lines.
29-41: Confirm whether module loading should be restricted to batch engine.The template always prints a “Loading modules” message but only calls
. ./mfc.sh load -c t -m ${'g' if gpu else 'c'}whenengine == 'batch'. For Tuolumne batch jobs this works as intended; however, if this template is ever used with a non-batchengine (e.g., an interactive mode), it will silently skip module loading.If interactive runs on Tuolumne are expected to rely on this template, consider dropping the
engine == 'batch'guard around themfc.sh loadcall so both paths get a consistent environment.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
toolchain/bootstrap/modules.sh(1 hunks)toolchain/modules(1 hunks)toolchain/templates/tuo.mako(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`
Applied to files:
toolchain/templates/tuo.mako
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Name modules with `m_<feature>` prefix (e.g., `m_transport`)
Applied to files:
toolchain/modules
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to **/*.{fpp,f90} : Name modules with m_<feature> pattern (e.g., m_transport)
Applied to files:
toolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `private` declaration followed by explicit `public` exports in modules
Applied to files:
toolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability
Applied to files:
toolchain/modules
🧬 Code graph analysis (2)
toolchain/templates/tuo.mako (2)
toolchain/mfc/run/run.py (1)
__generate_job_script(93-113)toolchain/mfc/run/input.py (1)
generate_fpp(60-95)
toolchain/bootstrap/modules.sh (1)
toolchain/util.sh (2)
log(11-11)log_n(12-12)
🪛 Shellcheck (0.11.0)
toolchain/bootstrap/modules.sh
[warning] 48-48: The surrounding quotes actually unquote this. Remove or escape them.
(SC2027)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: Coverage Test on CodeCov
- GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, true)
- GitHub Check: Github (ubuntu, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, no-debug, false)
- GitHub Check: Github (macos, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, debug, false)
- GitHub Check: Github (ubuntu, mpi, no-debug, true)
- GitHub Check: Code Cleanliness Check
- GitHub Check: Agent
- GitHub Check: cubic · AI code reviewer
- GitHub Check: Build & Publish
🔇 Additional comments (1)
toolchain/modules (1)
48-52: Tuolumne module slug and stacks look consistent with existing systems.The
tslug and itst-all/t-gpuentries mirror the Frontier setup and match howmodules.shcomposes"$u_c-all"and"$u_c-$cg". CPU runs will pick upt-allonly (not-cpuline), and GPU runs addcraype-accel-amd-gfx942plusHSA_XNACK=1, which fits the expected pattern.I don’t see any issues with this addition.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1079 +/- ##
=======================================
Coverage 44.16% 44.16%
=======================================
Files 71 71
Lines 20197 20197
Branches 1970 1970
=======================================
Hits 8921 8921
Misses 10148 10148
Partials 1128 1128 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|

User description
User description
Description
Add modules and template for running on Tyolumne
Fixes #(issue) [optional]
Type of change
Please delete options that are not relevant.
Scope
How Has This Been Tested?
examples/3D_performance_testinteractively and observed expected performanceexamples/3D_performance_testin batch mode and observed expected performancePR Type
Enhancement
Description
Add Tuolumne (OLCF) system support with modules and template
Configure GPU and CPU module variants for Tuolumne
Update bootstrap script to include Tuolumne in system selection
Add Flux batch job scheduler template for Tuolumne
Diagram Walkthrough
File Walkthrough
modules.sh
Add Tuolumne to system selection menutoolchain/bootstrap/modules.sh
tuo.mako
Add Tuolumne Flux batch job templatetoolchain/templates/tuo.mako
resource requests
loading
modules
Configure Tuolumne module variantstoolchain/modules
cray-python, cmake
environment variable
CodeAnt-AI Description
Add Tuolumne (OLCF) system selection, modules and Flux batch template
What Changed
Impact
✅ Submit Flux jobs to Tuolumne with correct node, task and GPU allocation✅ Automatic module loading for Tuolumne CPU and GPU runs✅ Correct MPICH GPU support enabled/disabled for Tuolumne jobs💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.