-
Notifications
You must be signed in to change notification settings - Fork 125
Oscar Mako and module update #1082
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -4,7 +4,7 @@ | |||||||||||
|
|
||||||||||||
| % if engine == 'batch': | ||||||||||||
| #SBATCH --nodes=${nodes} | ||||||||||||
| #SBATCH --ntasks-per-node=${tasks_per_node} | ||||||||||||
| #SBATCH --tasks-per-node=${tasks_per_node} | ||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check all SBATCH-related options in oscar.mako and other templates
echo "=== oscar.mako SBATCH directives ==="
rg -n 'SBATCH' toolchain/templates/oscar.mako | head -20
echo ""
echo "=== All templates SBATCH directives with tasks-per-node or ntasks-per-node ==="
rg -n --type=mako '(tasks-per-node|ntasks-per-node)' toolchain/templates/
echo ""
echo "=== Full oscar.mako file (if small enough) ==="
wc -l toolchain/templates/oscar.makoRepository: MFlowCode/MFC Length of output: 170 🌐 Web query:
💡 Result: Short answer: The canonical SLURM option is --ntasks-per-node (request N tasks on each node). --ntasks (or -n) sets the total number of tasks and takes precedence if both are given; --ntasks-per-node is treated as a maximum per-node count when used with --ntasks. Some local docs/sites may show --tasks-per-node as an informal/alternate wording, but official SLURM documentation and manpages use --ntasks-per-node. [1][2][3] References
[1] https://slurm.schedmd.com/srun.html Revert the SBATCH option to --ntasks-per-node; --tasks-per-node is invalid. According to official SLURM documentation, the canonical option is 🤖 Prompt for AI Agents |
||||||||||||
| #SBATCH --cpus-per-task=1 | ||||||||||||
| #SBATCH --job-name="${name}" | ||||||||||||
| #SBATCH --time=${walltime} | ||||||||||||
|
|
@@ -15,9 +15,8 @@ | |||||||||||
| #SBATCH --account="${account}" | ||||||||||||
| % endif | ||||||||||||
| % if gpu_enabled: | ||||||||||||
| #SBATCH --gpus-per-node=${tasks_per_node} | ||||||||||||
| #SBATCH --mem=64G | ||||||||||||
| #SBATCH --gpu-bind=closest | ||||||||||||
| #SBATCH --gpu-bind=verbose,closest | ||||||||||||
| #SBATCH --gres=gpu:v100-16:${tasks_per_node} | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: Hardcoded GPU resource/type may not exist on all clusters and can cause sbatch to reject the job; request the GPU count generically (or parameterize the GPU type) instead of forcing "v100-16". [possible bug] Severity Level: Critical 🚨
Suggested change
Why it matters? ⭐This is a valid and practical suggestion: hardcoding a specific GPU model (v100-16) can indeed cause sbatch to reject jobs on clusters that don't expose that exact resource name. Replacing the line with a generic gres request (or better, parameterizing the GPU model with a template variable) is safer and fixes a real portability issue visible in the diff. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** toolchain/templates/oscar.mako
**Line:** 19:19
**Comment:**
*Possible Bug: Hardcoded GPU resource/type may not exist on all clusters and can cause sbatch to reject the job; request the GPU count generically (or parameterize the GPU type) instead of forcing "v100-16".
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||||||||
| % endif | ||||||||||||
| #SBATCH --output="${name}.out" | ||||||||||||
| #SBATCH --error="${name}.err" | ||||||||||||
|
|
@@ -31,7 +30,7 @@ | |||||||||||
| ${helpers.template_prologue()} | ||||||||||||
|
|
||||||||||||
| ok ":) Loading modules:\n" | ||||||||||||
| cd "${MFC_ROOTDIR}" | ||||||||||||
| cd "${MFC_ROOT_DIR}" | ||||||||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: The added unconditional "cd ${MFC_ROOT_DIR}" will fail silently if Severity Level: Minor
Suggested change
Why it matters? ⭐Adding a guard to ensure MFC_ROOT_DIR is set and is a directory is sensible: as written the script will attempt to cd and then source ./mfc.sh relative to the current dir, which will silently (or noisily) fail if the variable is unset or points nowhere. The proposed check makes the failure explicit and fails fast, which is desirable for a job script that must set up environment. Prompt for AI Agent 🤖This is a comment left during a code review.
**Path:** toolchain/templates/oscar.mako
**Line:** 33:33
**Comment:**
*Resource Leak: The added unconditional "cd ${MFC_ROOT_DIR}" will fail silently if `MFC_ROOT_DIR` is unset or not a directory, causing the subsequent source of `mfc.sh` to run in the wrong directory and break the script; check existence and directory-ness before changing directory and exit with a clear error if invalid.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise. |
||||||||||||
| . ./mfc.sh load -c o -m ${'g' if gpu_enabled else 'c'} | ||||||||||||
| cd - > /dev/null | ||||||||||||
| echo | ||||||||||||
|
|
@@ -42,9 +41,8 @@ echo | |||||||||||
| % if not mpi: | ||||||||||||
| (set -x; ${profiler} "${target.get_install_binpath(case)}") | ||||||||||||
| % else: | ||||||||||||
| (set -x; ${profiler} \ | ||||||||||||
| mpirun -np ${nodes*tasks_per_node} \ | ||||||||||||
| ${' '.join([f"'{x}'" for x in ARG('--') ])} \ | ||||||||||||
| (set -x; ${profiler} \ | ||||||||||||
| mpirun -np ${nodes*tasks_per_node} \ | ||||||||||||
| "${target.get_install_binpath(case)}") | ||||||||||||
|
Comment on lines
+44
to
46
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Suggestion: To correctly profile the parallel application instead of the MPI launcher, move the
Suggested change
|
||||||||||||
| % endif | ||||||||||||
|
|
||||||||||||
|
|
||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Typo in the Python module name: the added token
python/3.13.10shas an extra trailing "s" which will make the module name invalid/unavailable to the module loader and cause this entry to fail at load time; remove the stray "s" so the version matches the rest of the file. [possible bug]Severity Level: Critical 🚨
Why it matters? ⭐
The added token "python/3.13.10s" very likely contains a typo — all other entries use numeric version tokens (e.g., python/3.12.5). The stray "s" will most likely make the module name not match the expected modulefile and could break module loading. Changing it to python/3.13.10 is a straightforward, low-risk fix that improves correctness.
Prompt for AI Agent 🤖