Skip to content

Conversation

@wilfonba
Copy link
Contributor

@wilfonba wilfonba commented Dec 5, 2025

User description

User description

Description

Add modules and template for running on Tyolumne

Fixes #(issue) [optional]

Type of change

Please delete options that are not relevant.

  • Something else

Scope

  • This PR comprises a set of related changes with a common goal

How Has This Been Tested?

  • Build on a login node using the modules added
  • Ran examples/3D_performance_test interactively and observed expected performance
  • Ran examples/3D_performance_test in batch mode and observed expected performance

PR Type

Enhancement


Description

  • Add Tuolumne (OLCF) system support with modules and template

  • Configure GPU and CPU module variants for Tuolumne

  • Update bootstrap script to include Tuolumne in system selection

  • Add Flux batch job scheduler template for Tuolumne


Diagram Walkthrough

flowchart LR
  A["Bootstrap Script"] -->|"Add Tuolumne option"| B["System Selection Menu"]
  C["Module Configuration"] -->|"Define t/t-all/t-gpu variants"| D["Tuolumne Modules"]
  E["Batch Template"] -->|"Flux scheduler config"| F["Tuolumne Execution"]
  D --> F
  B --> F
Loading

File Walkthrough

Relevant files
Enhancement
modules.sh
Add Tuolumne to system selection menu                                       

toolchain/bootstrap/modules.sh

  • Add Tuolumne (t) to system selection menu with LLNL label
  • Update prompt to include 't' option in available choices
  • Improve formatting alignment for Brown/Oscar entry
+3/-2     
tuo.mako
Add Tuolumne Flux batch job template                                         

toolchain/templates/tuo.mako

  • Create new Flux batch scheduler template for Tuolumne
  • Configure job submission with node allocation, task distribution, and
    resource requests
  • Support both GPU and CPU execution modes with conditional module
    loading
  • Include GPU support environment variables and profiler integration
+63/-0   
Configuration changes
modules
Configure Tuolumne module variants                                             

toolchain/modules

  • Add Tuolumne system entry with OLCF label
  • Define t-all modules: cpe/25.03, rocm/6.3.1, cray-fftw, cray-hdf5,
    cray-python, cmake
  • Define t-gpu modules: craype-accel-amd-gfx942 and HSA_XNACK=1
    environment variable
+6/-0     


CodeAnt-AI Description

Add Tuolumne (OLCF) system selection, modules and Flux batch template

What Changed

  • The bootstrap system-selection menu now includes "Tuolumne", letting users choose the OLCF Tuolumne system during setup.
  • Added Tuolumne module definitions with CPU and GPU variants so the environment loads the correct modules for CPU or GPU runs on Tuolumne.
  • Added a Flux batch job template for Tuolumne that loads Tuolumne modules automatically, sets MPICH GPU support appropriately, and submits jobs with the correct nodes, tasks, and GPU allocation.

Impact

✅ Submit Flux jobs to Tuolumne with correct node, task and GPU allocation
✅ Automatic module loading for Tuolumne CPU and GPU runs
✅ Correct MPICH GPU support enabled/disabled for Tuolumne jobs

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features
    • Added support for a new system option in the toolchain selection menu.
    • Introduced new module configurations with GPU acceleration support for the new system.
    • Added batch job execution template with Flux integration for the new system environment.

✏️ Tip: You can customize this high-level summary in your review settings.

Copilot AI review requested due to automatic review settings December 5, 2025 19:34
@codeant-ai
Copy link

codeant-ai bot commented Dec 5, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 5, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

The pull request adds support for a new LLNL system named Tuolumne (abbreviated as 't') to the toolchain infrastructure. This includes adding the system to the bootstrap module selection interface, configuring associated modules with CPU and GPU variants, and introducing a new template for batch job submission via Flux.

Changes

Cohort / File(s) Summary
Tuolumne system bootstrap configuration
toolchain/bootstrap/modules.sh
Added Tuolumne (t) option to system selection output and selection prompt; adjusted formatting for Brown (Oscar) entry
Tuolumne module configuration
toolchain/modules
Added new module slug 't' for OLCF Tuolumne with CPU variant (t-all: cpe/25.03, rocm/6.3.1, cray-fftw, cray-hdf5, cray-python, cmake) and GPU variant (t-gpu: craype-accel-amd-gfx942, HSA_XNACK=1)
Tuolumne batch submission template
toolchain/templates/tuo.mako
New Mako template generating Bash script for Flux batch job submission with support for MPI/non-MPI execution, GPU acceleration, module loading, and pre/post-execution hooks

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~18 minutes

  • Areas requiring attention:
    • Verify correctness of Tuolumne module names and versions (rocm/6.3.1, cpe/25.03, gfx942 target) align with system specifications
    • Review Mako template logic for HSA_XNACK flag and GPU support conditionals
    • Confirm GPU module naming convention (craype-accel-amd-gfx942) matches available module catalog

Poem

🐰 A hoppy new system joins the fold,
Tuolumne's tale begins to unfold!
With modules galore and Flux so bright,
Rocky computations run day and night,
t marks the spot—hopping forward with might! ⚡

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete and does not follow the required template structure adequately. Complete the description by: (1) Marking the 'New feature' checkbox under 'Type of change', (2) Checking the 'Scope' checkbox to confirm related changes with common goal, (3) Adding details about test configuration/environment, (4) Completing relevant checklist items that apply to the changes (documentation, comments, tests, DRY principle).
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes—adding Tuolumne modules and a batch template—which aligns with the file modifications across bootstrap, modules, and template files.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@qodo-code-review
Copy link
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 2 🔵🔵⚪⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

Duplicate hugepages/THP attributes are set unconditionally and again under the 'unified' flag; confirm intended duplication and correctness of flux directives syntax (missing space after '# flux:' on some lines).

# flux: --time=${walltime}
# flux: --exclusive
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% if account:
# flux: --bank=${account}
% endif
% if partition:
# flux: --queue=${partition}
% endif
% if unified:
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% endif
MPI/GPU Flags

Using MPICH_GPU_SUPPORT_ENABLED assumes MPICH; verify compatibility with Flux/Tuolumne MPI stack and whether additional ROCm env (e.g., HIP_VISIBLE_DEVICES, ROCR_VISIBLE_DEVICES, MIOPEN/ROCM tuning) are needed for correct GPU runs.

% if gpu:
    export MPICH_GPU_SUPPORT_ENABLED=1
% else:
    export MPICH_GPU_SUPPORT_ENABLED=0
% endif
Module Consistency

't-all' uses 'cray-python' while 'f-all' uses 'python'; ensure consistent Python toolchain naming and versioning across systems to avoid environment mismatches.

t     OLCF Tuolumne
t-all cpe/25.03 rocm/6.3.1
t-all cray-fftw cray-hdf5 cray-python cmake
t-gpu craype-accel-amd-gfx942
t-gpu HSA_XNACK=1

@codeant-ai codeant-ai bot added the size:M This PR changes 30-99 lines, ignoring generated files label Dec 5, 2025
Comment on lines +12 to +24
# flux: --exclusive
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% if account:
# flux: --bank=${account}
% endif
% if partition:
# flux: --queue=${partition}
% endif
% if unified:
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Fix a typo in the # flux:--setattr directive by adding a space. Also, remove the duplicated setattr and coral2-hugepages directives, keeping them only within the if unified: block. [possible issue, importance: 8]

Suggested change
# flux: --exclusive
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% if account:
# flux: --bank=${account}
% endif
% if partition:
# flux: --queue=${partition}
% endif
% if unified:
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
% endif
# flux: --exclusive
% if account:
# flux: --bank=${account}
% endif
% if partition:
# flux: --queue=${partition}
% endif
% if unified:
# flux: --setattr=thp=always
# flux: --coral2-hugepages=512GB
% endif

f-all cray-fftw cray-hdf5 python cmake
f-gpu python craype-accel-amd-gfx90a rocprofiler-compute/3.0.0

t OLCF Tuolumne
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Correct the inconsistent facility label for the 'Tuolumne' system. It is labeled OLCF in toolchain/modules but LLNL in toolchain/bootstrap/modules.sh; they should be consistent. [general, importance: 7]

Suggested change
t OLCF Tuolumne
t LLNL Tuolumne

@wilfonba
Copy link
Contributor Author

wilfonba commented Dec 5, 2025

This needs some work. For some reason I get a memory access fault at runtime now.

t OLCF Tuolumne
t-all cpe/25.03 rocm/6.3.1
t-all cray-fftw cray-hdf5 cray-python cmake
t-gpu craype-accel-amd-gfx942
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Typo in module name: craype-accel-amd-gfx942 is almost certainly misspelled and will not match the real module name (typically craype-accel-amd-gfx90a), causing module load failures on Tuolumne GPU nodes. [possible bug]

Severity Level: Critical 🚨

Suggested change
t-gpu craype-accel-amd-gfx942
t-gpu craype-accel-amd-gfx90a
Why it matters? ⭐

This is a real, likely-typo bug. Other entries (e.g. f-gpu) use craype-accel-amd-gfx90a which matches known Cray module names; gfx942 does not follow that pattern and will likely fail to load on Tuolumne GPU nodes. Replacing with gfx90a aligns with the rest of the file and fixes a concrete runtime failure.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** toolchain/modules
**Line:** 51:51
**Comment:**
	*Possible Bug: Typo in module name: `craype-accel-amd-gfx942` is almost certainly misspelled and will not match the real module name (typically `craype-accel-amd-gfx90a`), causing module load failures on Tuolumne GPU nodes.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

# flux: --error="${name}.err"
# flux: --time=${walltime}
# flux: --exclusive
# flux:--setattr=thp=always
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...". [possible bug]

Severity Level: Critical 🚨

Suggested change
# flux:--setattr=thp=always
# flux: --setattr=thp=always
Why it matters? ⭐

Flux job-file directives are typically written as "# flux: --option=..." (a space after the colon).
Without the space the scheduler parser may not recognize the line as a directive, so adding the space fixes a probable real parsing bug rather than being purely cosmetic.
The PR already contains this exact line at L13; changing it to "# flux: --setattr=thp=always" is the correct single-line fix.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** toolchain/templates/tuo.mako
**Line:** 13:13
**Comment:**
	*Possible Bug: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...".

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

# flux: --queue=${partition}
% endif
% if unified:
# flux:--setattr=thp=always
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Same formatting issue inside the unified block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed. [possible bug]

Severity Level: Critical 🚨

Suggested change
# flux:--setattr=thp=always
# flux: --setattr=thp=always
Why it matters? ⭐

Same issue as the previous suggestion but in the unified block (L21-L24). The missing space can prevent Flux from parsing the directive.
Fixing it to "# flux: --setattr=thp=always" is correct and necessary for reliable scheduler behavior when the unified block is rendered.

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** toolchain/templates/tuo.mako
**Line:** 22:22
**Comment:**
	*Possible Bug: Same formatting issue inside the `unified` block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.

@codeant-ai
Copy link

codeant-ai bot commented Dec 5, 2025

CodeAnt AI finished reviewing your PR.

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for running MFC on LLNL's Tuolumne system by introducing a new batch template (tuo.mako) and corresponding module configurations. Tuolumne is configured similarly to ORNL's Frontier, using Cray Programming Environment with ROCm for AMD GPU support (specifically gfx942 architecture).

Key changes:

  • New Flux-based batch template for Tuolumne with GPU/CPU mode support
  • Module configuration for Tuolumne using CPE 25.03, ROCm 6.3.1, and Cray libraries
  • Integration of Tuolumne into the bootstrap system selection menu

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
toolchain/templates/tuo.mako New Flux batch system template for Tuolumne with MPICH GPU support and flux job submission directives
toolchain/modules Adds Tuolumne ('t') module definitions with Cray environment, ROCm, and AMD gfx942 GPU accelerator support
toolchain/bootstrap/modules.sh Integrates Tuolumne into system selection menu under LLNL and improves alignment for Brown Oscar entry

ok ":) Loading modules:\n"
cd "${MFC_ROOT_DIR}"
% if engine == 'batch':
. ./mfc.sh load -c t -m ${'g' if gpu else 'c'}
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect GPU check in module loading command. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu will be True even when gpu == 'no' because 'no' is a truthy string. This should be ${'g' if gpu != 'no' else 'c'} to match the pattern used in other templates like frontier.mako (line 37).

Suggested change
. ./mfc.sh load -c t -m ${'g' if gpu else 'c'}
. ./mfc.sh load -c t -m ${'g' if gpu != 'no' else 'c'}

Copilot uses AI. Check for mistakes.
# flux: --error="${name}.err"
# flux: --time=${walltime}
# flux: --exclusive
# flux:--setattr=thp=always
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after flux: in the batch directive. Should be # flux: --setattr=thp=always for consistency with other flux directives in this file.

Copilot uses AI. Check for mistakes.
# flux: --queue=${partition}
% endif
% if unified:
# flux:--setattr=thp=always
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing space after flux: in the batch directive. Should be # flux: --setattr=thp=always for consistency with other flux directives in this file.

Suggested change
# flux:--setattr=thp=always
# flux: --setattr=thp=always

Copilot uses AI. Check for mistakes.
Comment on lines +37 to +40
% if gpu:
export MPICH_GPU_SUPPORT_ENABLED=1
% else:
export MPICH_GPU_SUPPORT_ENABLED=0
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 42).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
% if gpu:
--gpus-per-task 1 \
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 69).

Copilot uses AI. Check for mistakes.
Comment on lines +13 to +14
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB
Copy link

Copilot AI Dec 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate directives: Lines 13-14 are repeated unconditionally here and then again conditionally at lines 22-23 when unified is true. If these directives are always needed, they should only appear once (lines 13-14). If they should only apply when unified is true, move them inside the if unified: block and remove lines 13-14. The current code will result in these directives appearing twice when unified is enabled.

Suggested change
# flux:--setattr=thp=always
# flux: --coral2-hugepages=512GB

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
toolchain/bootstrap/modules.sh (1)

42-48: Tuolumne is correctly wired into the selector; prompt string could be simplified (optional).

The new LLNL/Tuolumne entry and inclusion of (t) in the selection prompt look consistent with existing systems and will flow correctly into the u_c slug logic.

The log_n line’s quoting pattern is quite dense and is what shellcheck flags (SC2027). It’s functionally fine and matches the surrounding style, but if you care about shellcheck noise, consider simplifying it to use a single quoted string with ${...} expansions (or adding a local # shellcheck disable=SC2027 for that line).

toolchain/templates/tuo.mako (2)

5-25: Avoid duplicating THP/hugepage Flux directives when unified is true.

In the Flux header, --setattr=thp=always and --coral2-hugepages=512GB are emitted unconditionally (lines 13–14) and again inside the if unified block (lines 22–23). When unified is true this results in duplicate directives, which is probably harmless but redundant and a bit confusing.

Consider keeping these options either only unconditionally or only inside the unified block, and also normalizing the spacing to # flux: --... for consistency with other lines.


29-41: Confirm whether module loading should be restricted to batch engine.

The template always prints a “Loading modules” message but only calls . ./mfc.sh load -c t -m ${'g' if gpu else 'c'} when engine == 'batch'. For Tuolumne batch jobs this works as intended; however, if this template is ever used with a non-batch engine (e.g., an interactive mode), it will silently skip module loading.

If interactive runs on Tuolumne are expected to rely on this template, consider dropping the engine == 'batch' guard around the mfc.sh load call so both paths get a consistent environment.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6c380cd and e235612.

📒 Files selected for processing (3)
  • toolchain/bootstrap/modules.sh (1 hunks)
  • toolchain/modules (1 hunks)
  • toolchain/templates/tuo.mako (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Draft a step-by-step plan before making changes; build after each step using `./mfc.sh build -t pre_process simulation -j $(nproc)`

Applied to files:

  • toolchain/templates/tuo.mako
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Name modules with `m_<feature>` prefix (e.g., `m_transport`)

Applied to files:

  • toolchain/modules
📚 Learning: 2025-11-24T21:50:16.713Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2025-11-24T21:50:16.713Z
Learning: Applies to **/*.{fpp,f90} : Name modules with m_<feature> pattern (e.g., m_transport)

Applied to files:

  • toolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Applies to **/*.{fpp,f90} : Use `private` declaration followed by explicit `public` exports in modules

Applied to files:

  • toolchain/modules
📚 Learning: 2025-11-24T21:50:46.909Z
Learnt from: CR
Repo: MFlowCode/MFC PR: 0
File: .cursor/rules/mfc-agent-rules.mdc:0-0
Timestamp: 2025-11-24T21:50:46.909Z
Learning: Compile with Cray `ftn` or NVIDIA `nvfortran` for GPU offloading; also build CPU-only with GNU `gfortran` and Intel `ifx`/`ifort` for portability

Applied to files:

  • toolchain/modules
🧬 Code graph analysis (2)
toolchain/templates/tuo.mako (2)
toolchain/mfc/run/run.py (1)
  • __generate_job_script (93-113)
toolchain/mfc/run/input.py (1)
  • generate_fpp (60-95)
toolchain/bootstrap/modules.sh (1)
toolchain/util.sh (2)
  • log (11-11)
  • log_n (12-12)
🪛 Shellcheck (0.11.0)
toolchain/bootstrap/modules.sh

[warning] 48-48: The surrounding quotes actually unquote this. Remove or escape them.

(SC2027)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: Coverage Test on CodeCov
  • GitHub Check: Github (ubuntu, no-mpi, single, no-debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, true)
  • GitHub Check: Github (ubuntu, mpi, no-debug, false)
  • GitHub Check: Github (macos, mpi, no-debug, false)
  • GitHub Check: Github (macos, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, debug, false)
  • GitHub Check: Github (ubuntu, mpi, no-debug, true)
  • GitHub Check: Code Cleanliness Check
  • GitHub Check: Agent
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: Build & Publish
🔇 Additional comments (1)
toolchain/modules (1)

48-52: Tuolumne module slug and stacks look consistent with existing systems.

The t slug and its t-all / t-gpu entries mirror the Frontier setup and match how modules.sh composes "$u_c-all" and "$u_c-$cg". CPU runs will pick up t-all only (no t-cpu line), and GPU runs add craype-accel-amd-gfx942 plus HSA_XNACK=1, which fits the expected pattern.

I don’t see any issues with this addition.

@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 44.16%. Comparing base (8d9a83b) to head (50a572d).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1079   +/-   ##
=======================================
  Coverage   44.16%   44.16%           
=======================================
  Files          71       71           
  Lines       20197    20197           
  Branches     1970     1970           
=======================================
  Hits         8921     8921           
  Misses      10148    10148           
  Partials     1128     1128           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@wilfonba
Copy link
Contributor Author

wilfonba commented Dec 6, 2025

I fixed the memory access fault by replacing export HSA_XNACK=1 with export HSA_XNACK=0. I believe that OpenMP is required for HSA_XNACK=1. I also ran scaling tests to evaluate if GPU aware MPI was working. I used the examples/3D_performance_test case for this. The strong scaling tests were done with a 780^3 base case, and the weak scaling tests were done with 300^3 per device.

scaling

@sbryngelson sbryngelson marked this pull request as draft December 9, 2025 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Review effort 2/5 size:M This PR changes 30-99 lines, ignoring generated files

Development

Successfully merging this pull request may close these issues.

1 participant