-
Notifications
You must be signed in to change notification settings - Fork 15
check process binding #312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I think I found a problem. [satishk@tcn3 ~]$ hwloc-calc -p -H package.numanode core:0-5
Package:0.NUMANode:0
[satishk@tcn3 ~]$ module unload hwloc/2.9.1-GCCcore-12.3.0
[satishk@tcn3 ~]$ module load hwloc/2.8.0-GCCcore-12.2.0
The following have been reloaded with a version change:
1) GCCcore/12.3.0 => GCCcore/12.2.0 2) libpciaccess/0.17-GCCcore-12.3.0 => libpciaccess/0.17-GCCcore-12.2.0 3) libxml2/2.11.4-GCCcore-12.3.0 => libxml2/2.10.3-GCCcore-12.2.0 4) numactl/2.0.16-GCCcore-12.3.0 => numactl/2.0.16-GCCcore-12.2.0
[satishk@tcn3 ~]$ hwloc-calc -p -H package.numanode core:0-5
unsupported (non-normal) --hierarchical type numanodeSomewhere between |
|
good catch. |
|
@satishskamath fallback added. i also added a check for the number of nodes. |
this runs a short test in a prerun cmd to get the process binding, which is checked with the
check_process_binding.pyscript. the results are written into the job error file.fixes #307
Important
the test currently doesn't fail on binding error, as we don't yet have a bullet-proof solution for setting the binding in all cases (see also the discussion in #305). so, for now, both the errors and warnings are printed as warnings on screen, adding sanity checks can be added in a follow-up PR.
example output:
Note
i managed to get the correct launcher run command by updating the job resources in the
assign_tasks_per_compute_unitfunction. this also allowed simplifying the openfoam test and make it more robust.