Draft kokkos-comm initialization/finalization using MPI_Sessions#68
Draft kokkos-comm initialization/finalization using MPI_Sessions#68dssgabriel wants to merge 20 commits intokokkos:developfrom
MPI_Sessions#68Conversation
src/KokkosComm_communicator.hpp
Outdated
| comm_kind = CommunicatorKind::User; | ||
| } else { | ||
| fprintf(stderr, "[KokkosComm] error: intercommunicators are not supported (yet).\n"); | ||
| std::terminate(); |
There was a problem hiding this comment.
Don't just terminate. Either throw an exception or call MPI_Abort (I've seen applications hang where one process aborted without calling MPI_Abort, it's nasty)
There was a problem hiding this comment.
Any specific error code you suggest?
There was a problem hiding this comment.
Not really, anything non-zero would work. This ties in with #29 so anything you do here is probably temporary anyway :)
|
|
||
| static auto dup(const Communicator &other) -> Communicator { return Communicator::dup_raw(other.as_raw()); } | ||
|
|
||
| static auto split_raw(MPI_Comm raw, Color color, Key key) -> Communicator { |
There was a problem hiding this comment.
Is there a reason why this isn't just an overload of split?
There was a problem hiding this comment.
I wanted to make it explicit that we're splitting from a raw MPI_Comm handle, not a wrapped KokkosComm::Communicator, but I guess we can also simply overload.
There was a problem hiding this comment.
I guess that's a design decision that the Kokkos community should make (to be consistent with the other projects)
There was a problem hiding this comment.
You can drop the Raw. We typically are happy with overloads.
src/KokkosComm_communicator.hpp
Outdated
| inline static auto self(void) -> Communicator { return Communicator::from_raw_unchecked(MPI_COMM_SELF); } | ||
|
|
||
| inline static auto world(void) -> Communicator { return Communicator::from_raw_unchecked(MPI_COMM_WORLD); } |
There was a problem hiding this comment.
That doesn't belong into the communicator. Since we use sessions, there should be no use of the world process model. It could even be that no one has called MPI_Init.
src/impl/KokkosComm_MPI_instance.hpp
Outdated
| } | ||
|
|
||
| template <KokkosExecutionSpace ExecSpace> | ||
| auto initialize(int &argc, char *argv[]) -> Universe<ExecSpace> { |
There was a problem hiding this comment.
It's not obvious that once version of initialize uses the WPM and one uses the SPM. I'd rather stay away from the WPM tbh.
There was a problem hiding this comment.
I'll remove the one using the WPM so that we only initialize using the SPM.
|
Thank you very much for the first review, @devreal! Regarding C++ destruction and the SPM finalization semantics: is |
|
Just a PSA: OMPI in combination with UCX won't support Sessions until the next major release of OMPI (6.0): open-mpi/ompi#12566 (comment) |
|
It looks like Linux's OpenMPI is too old to support @cwpearson Is there any way we can specifically install OpenMPI 5.x in the CI? The package from Ubuntu repos is simply not up-to-date. |
The standard says that the application is required to clean up its objects before
We had a similar issue in other projects and came down to a registry in which objects that needed destruction were referenced. There are two options they can get destroyed:
All of this must be thread-safe etc but communicators are heavy objects that are not created and destroyed regularly so I think the overhead involved in such a scheme is acceptable. |
|
Would you like to rebase this on top of current develop? |
Open MPI's session support has not been released yet. It will be included in the next major release (tbd when that comes out) |
This PR is a first attempt at initialization and finalization for
kokkos-commrelying onMPI_Sessions.Some noteworthy additions brought by this PR:
Communicatorclass that wraps theMPI_Command a Kokkos execution spaceUniverseclass that holds the handle to the MPI sessions, as well as a session-associated communicator.I expect lots of changes in the API, which I find kind of clunky at the moment. Reviews and comments on how to improve are welcome.