-
Notifications
You must be signed in to change notification settings - Fork 15
fix process binding for mpirun with OpenMPI #305
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This option of specifying PEs per slot was removed for the default mapping policy within OMPIv5. This is now being added back and will be merged in further OMPI updates but not the existing versions. I have asked for an equivalent environment variable for the command line option |
|
So... one option I see is to query the launcher and inject the right |
|
apparently intel mpirun happily ignores EDIT: this only works for 2025a, it fails in 2024a |
another solution (and probably easier) is to patch the current OpenMPI v5 easyconfigs with the changes posted by Ralph so we can continue using the envvars. |
That's not a bad idea, assuming the patch applies as a stand-alone thing on the OpenMPI v5 versions we have in EESSI. The only (minor) downside is that for people running the test-suite on a local software stack, this won't "help". But seeing as this is a transient problem anyway, I'd favor this solution above coming up with something that complicates the test-suite code unnecessarily (and that we then have to maintain / deprecate / take out once the time comes). So: thumbs-up from my side for this idea :) |
|
As long as the patch does not affect the existing code in any other manner and also because this is a transient issue (only affects certain versions of OMPI), I am also in favour of patching it at the EB level. But the patch should include only parts that activate these env variables, I think Ralph is the only one who can provide it in a clean manner. We can try and request it. May be good to open an issue in EB repo as well. |
| pattern = "|".join(ompi_patterns) | ||
| if any(re.search(pattern, x) for x in test.modules): | ||
| test.job.launcher.options.append(f'--map-by slot:PE={physical_cpus_per_task} --report-bindings') | ||
| log(f'Set launcher command to {test.job.launcher.run_command(test.job)}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the run command logged here is not correct, but that will be fixed once #312 is merged
fixes #297