-
Notifications
You must be signed in to change notification settings - Fork 125
Add tuo modules and template #1079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -45,6 +45,12 @@ f-all cpe/25.03 rocm/6.3.1 | |||||
| f-all cray-fftw cray-hdf5 python cmake | ||||||
| f-gpu python craype-accel-amd-gfx90a rocprofiler-compute/3.0.0 | ||||||
|
|
||||||
| t OLCF Tuolumne | ||||||
| t-all cpe/25.03 rocm/6.3.1 | ||||||
| t-all cray-fftw cray-hdf5 cray-python cmake | ||||||
| t-gpu craype-accel-amd-gfx942 | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Typo in module name: Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐This is a real, likely-typo bug. Other entries (e.g. f-gpu) use craype-accel-amd-gfx90a which matches known Cray module names; gfx942 does not follow that pattern and will likely fail to load on Tuolumne GPU nodes. Replacing with gfx90a aligns with the rest of the file and fixes a concrete runtime failure. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** toolchain/modules
**Line:** 51:51
**Comment:**
*Possible Bug: Typo in module name: `craype-accel-amd-gfx942` is almost certainly misspelled and will not match the real module name (typically `craype-accel-amd-gfx90a`), causing module load failures on Tuolumne GPU nodes.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||
| t-gpu HSA_XNACK=1 | ||||||
|
|
||||||
| d NCSA Delta | ||||||
| d-all python/3.11.6 | ||||||
| d-cpu gcc/11.4.0 openmpi | ||||||
|
|
||||||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,63 @@ | ||||||||||||||||||||||||||||||||||||||||||||||||||
| #!/usr/bin/env bash | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| <%namespace name="helpers" file="helpers.mako"/> | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| % if engine == 'batch': | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: -N ${nodes} | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: -n ${tasks_per_node*nodes} | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --job-name="${name}" | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --output="${name}.out" | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --error="${name}.err" | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --time=${walltime} | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --exclusive | ||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux:--setattr=thp=always | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...". [possible bug] Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐Flux job-file directives are typically written as "# flux: --option=..." (a space after the colon). Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** toolchain/templates/tuo.mako
**Line:** 13:13
**Comment:**
*Possible Bug: Flux directive formatting error: the directive line is missing a space after the colon ("# flux:--setattr=...") so the Flux scheduler may not recognize it as a directive; add a space after "flux:" so it reads "# flux: --setattr=...".
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux: --coral2-hugepages=512GB | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+13
to
+14
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Same formatting issue inside the unified block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed. [possible bug]
Severity Level: Critical 🚨
| # flux:--setattr=thp=always | |
| # flux: --setattr=thp=always |
Why it matters? ⭐
Same issue as the previous suggestion but in the unified block (L21-L24). The missing space can prevent Flux from parsing the directive.
Fixing it to "# flux: --setattr=thp=always" is correct and necessary for reliable scheduler behavior when the unified block is rendered.
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** toolchain/templates/tuo.mako
**Line:** 22:22
**Comment:**
*Possible Bug: Same formatting issue inside the `unified` block: the Flux attribute line is missing a space after the colon, which can cause the scheduler to ignore the directive; add the space to ensure the directive is parsed.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing space after flux: in the batch directive. Should be # flux: --setattr=thp=always for consistency with other flux directives in this file.
| # flux:--setattr=thp=always | |
| # flux: --setattr=thp=always |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Fix a typo in the # flux:--setattr directive by adding a space. Also, remove the duplicated setattr and coral2-hugepages directives, keeping them only within the if unified: block. [possible issue, importance: 8]
| # flux: --exclusive | |
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % if account: | |
| # flux: --bank=${account} | |
| % endif | |
| % if partition: | |
| # flux: --queue=${partition} | |
| % endif | |
| % if unified: | |
| # flux:--setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % endif | |
| # flux: --exclusive | |
| % if account: | |
| # flux: --bank=${account} | |
| % endif | |
| % if partition: | |
| # flux: --queue=${partition} | |
| % endif | |
| % if unified: | |
| # flux: --setattr=thp=always | |
| # flux: --coral2-hugepages=512GB | |
| % endif |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check in module loading command. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu will be True even when gpu == 'no' because 'no' is a truthy string. This should be ${'g' if gpu != 'no' else 'c'} to match the pattern used in other templates like frontier.mako (line 37).
| . ./mfc.sh load -c t -m ${'g' if gpu else 'c'} | |
| . ./mfc.sh load -c t -m ${'g' if gpu != 'no' else 'c'} |
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 42).
Copilot
AI
Dec 5, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect GPU check. The gpu variable is a string ('no', 'acc', or 'mp'), not a boolean. Using if gpu: will be True even when gpu == 'no' because 'no' is a truthy string. This should be if gpu != 'no': to match the pattern used in other templates like frontier.mako (line 69).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Correct the inconsistent facility label for the 'Tuolumne' system. It is labeled
OLCFintoolchain/modulesbutLLNLintoolchain/bootstrap/modules.sh; they should be consistent. [general, importance: 7]