Skip to content

Conversation

@smoors
Copy link
Collaborator

@smoors smoors commented Nov 29, 2025

fixes #297

@smoors smoors marked this pull request as draft November 29, 2025 21:01
Samuel Moors added 3 commits November 29, 2025 22:05
@satishskamath
Copy link
Collaborator

satishskamath commented Dec 22, 2025

open-mpi/ompi#13586 (comment)

This option of specifying PEs per slot was removed for the default mapping policy within OMPIv5. This is now being added back and will be merged in further OMPI updates but not the existing versions. I have asked for an equivalent environment variable for the command line option --map-by because we can definitely do it there.

@casparvl
Copy link
Collaborator

casparvl commented Dec 23, 2025

So... one option I see is to query the launcher and inject the right --map-by arguments (if the launcher is mpirun). The downside is: it needs to be OpenMPI's mpirun, and I'm not sure how EESSI it is to determine that (at least I dont think Intel's mpirun would know this argument?)

@smoors
Copy link
Collaborator Author

smoors commented Dec 29, 2025

apparently intel mpirun happily ignores --map-by (but not --report-bindings), so it's doable

$ mpirun --map-by slot:PE=2 -genv I_MPI_DEBUG=4 hostname                                                                                                                                                             
Options -binding/-bind-to/-map-by/-membind are not supported and will be ignored.
        Please refer I_MPI_PIN environment variables family.
node800.hydra.os
node800.hydra.os
node800.hydra.os
node800.hydra.os

EDIT: this only works for 2025a, it fails in 2024a

@smoors
Copy link
Collaborator Author

smoors commented Dec 31, 2025

So... one option I see is to query the launcher and inject the right --map-by arguments (if the launcher is mpirun). The downside is: it needs to be OpenMPI's mpirun, and I'm not sure how EESSI it is to determine that (at least I dont think Intel's mpirun would know this argument?)

another solution (and probably easier) is to patch the current OpenMPI v5 easyconfigs with the changes posted by Ralph so we can continue using the envvars.

@smoors smoors mentioned this pull request Jan 3, 2026
@casparvl
Copy link
Collaborator

casparvl commented Jan 5, 2026

another solution (and probably easier) is to patch the current OpenMPI v5 easyconfigs with the changes posted by Ralph so we can continue using the envvars.

That's not a bad idea, assuming the patch applies as a stand-alone thing on the OpenMPI v5 versions we have in EESSI. The only (minor) downside is that for people running the test-suite on a local software stack, this won't "help". But seeing as this is a transient problem anyway, I'd favor this solution above coming up with something that complicates the test-suite code unnecessarily (and that we then have to maintain / deprecate / take out once the time comes).

So: thumbs-up from my side for this idea :)

@satishskamath
Copy link
Collaborator

As long as the patch does not affect the existing code in any other manner and also because this is a transient issue (only affects certain versions of OMPI), I am also in favour of patching it at the EB level. But the patch should include only parts that activate these env variables, I think Ralph is the only one who can provide it in a clean manner. We can try and request it. May be good to open an issue in EB repo as well.

@smoors
Copy link
Collaborator Author

smoors commented Jan 5, 2026

ok, i found that it's actually not that complicated to detect it's OpenMPI's mpirun, see 4f860af

as @casparvl noted, the advantage is that this will also work for local stacks without the need to reinstall anything with a patch.

@smoors smoors changed the title add PRTE_MCA_rmaps_base_mapping_policy fix process binding for mpirun with OpenMPI Jan 5, 2026
@smoors smoors marked this pull request as ready for review January 5, 2026 17:53
pattern = "|".join(ompi_patterns)
if any(re.search(pattern, x) for x in test.modules):
test.job.launcher.options.append(f'--map-by slot:PE={physical_cpus_per_task} --report-bindings')
log(f'Set launcher command to {test.job.launcher.run_command(test.job)}')
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the run command logged here is not correct, but that will be fixed once #312 is merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update OMPI_MCA environment variables used to their PRTE_MCA_ equivalents

3 participants