Reorganize for MPI / NCCL transport layers#109
Conversation
13405f4 to
8de569a
Compare
|
I haven't looked into the code in detail, but a few notes:
Thank you for the work on this big PR; things are progressing nicely! |
I was debating this myself - I was thinking about the MPI interop angle. I think it makes sense for us to actually provide stable-ish MPI wrappers since that is one of the motivations for this library and what our current users want from it.
Will do
This doesn't actually implement anything except MPI, so no danger here 😄 . We could implement Serial (or not) in the future as needed.
Good idea. |
d72f966 to
9126092
Compare
|
@dssgabriel I'm proposing some modifications to how the comm mode is used 9126092 this could potentially be a separate PR |
I fear that if we provide direct MPI wrappers people will just start using those instead of our "core" API, as it may be easier for them to adopt. This won't add any value to their code, and they may as well keep the MPI interop they already have in place. Some others questions/remarks:
Let's discuss this PR during next week's meeting, we may want to split it up even further. I'll give a better overview of what NCCL offers and how it should interact with this PR. 👍 |
|
It's done with structs in Kokkos because you can't do partial function specialization - e.g. you CAN'T do template <typename View, typename CommunicationSpace>
void send(View v);
// KokkosComm implementation for MPI in here
template <typename View>
void send<View, Mpi>(View v) {...};but you CAN do template <typename View, typename CommunicationSpace>
struct send;
// KokkosComm implementation for MPI in here
template <typename View>
struct send<View, Mpi> {...}; |
The value of MPI wrappers for the users is two places
There is demonstrated interest in these. They can be realized in the near term, and provide value to the overall effort:
Like you (?), I think the core API should be more thoughtful and novel. However, if what we come up with is such a small step forward that people find MPI-Kokkos interop more compelling, then they weren't going to use the new thing anyway. |
419f8f8 to
c454f51
Compare
c454f51 to
f40fece
Compare
f40fece to
86ce85c
Compare
86ce85c to
9c8066a
Compare
5e32a41 to
595794a
Compare
|
@dssgabriel how are we feeling about this? We have an external party asking about writing a backend to support their research, plus our own NCCL work. |
|
I think this is on the right track! A few points of discussion that we probably need to figure out before merging:
I will review the code once again tomorrow so that we can discuss it on next monday's meeting and quickly merge it afterwards. 👍 |
I tried reading the NCCL docs but I couldn't figure out what the semantics are if I do two ncclSend/ncclRecv in a row to the same destination. Do you know if they are non-overtaking like MPI? I think there needs to be some way to distinguish multiple in-flight messages with the same source and dest, but keeping messages ordered may be enough., I'll drop the tags for now and we can discuss.
I think just keep them for MPI since they are basically an MPI thing.
I think you're right about this, let me take another stab at it.
I prefer this too, I was just following Kokkos Core and Kernels. Let me see if I can make this change. Then we'll really be touching every file 😄 |
As far as I understand, NCCL enqueues communication operations on a CUDA stream. If two successive NCCL calls are on the same stream, they are non-overtaking. However, I think two communications enqueued on different streams may be able to overtake each other? I believe some folks at CEA know the lead developer of NCCL, so we'll try to set up a meeting with them and get answers to all our semantics-related questions.
LGTM 👍
I agree. Maybe we can automatically enable MPI if users compiled their code with
Great! I'm looking forward to reviewing every file 😁 Thank you for all the work on this PR! |
|
I ended up moving all the perf tests to an |
|
@dssgabriel I think all future work is waiting behind this PR, how are we feeling? |
cedricchevalier19
left a comment
There was a problem hiding this comment.
I am not particularly in favor of adding a public KokkosComm::mpi namespace. Can we put it in an Experimental namespace?
Else, I am in favor of merging this PR as it really improves code architecture., thanks a lot!
| @@ -0,0 +1,48 @@ | |||
| //@HEADER | |||
There was a problem hiding this comment.
I am not sure naming this file "mpi.hpp" is a good idea.
|
|
||
| #include <type_traits> | ||
|
|
||
| #include "../concepts.hpp" |
There was a problem hiding this comment.
Try to avoid relative paths in "public" includes.
|
Do we want to go ahead and just merge this? We could open small issues and gradually fix the various remarks we have brought up here. |
|
Let's merge this, and then I can open up issues for the comments here. I need an approving review though 😄 |
dssgabriel
left a comment
There was a problem hiding this comment.
Let's merge this and open issues for the relevant remarks we've made in the PR.
Main points I have in mind at the moment are:
- re-organize the files to match ISO C++ library design
- move the
mpinamespace behindExperimental
First part of #100
Introduce
TransportCommunicationSpaceconceptThis is all modeled after Kokkos Core execution spaces. Each will have its own subdirectory under src (like Kokkos' backends). The interface that each comm space needs to implement is a struct. For example,
Irecv, fromKokkosComm_fwd.hpp:Why a
struct? Because in practice what this means is that the comm space will have a partial specialization (not allowed for functions) that looks like this (frommpi/KokkosComm_mpi_irecv.hpp):Where
Mpiis astructin theKokkosCommnamespace, analogous to e.g.Kokkos::Cuda. A Serial transport would have a correspondingNCCL would have
and so on.
To support this, a lot of the include structure needs to be refined and adjusted to more closely match how Kokkos Core does it.
Future work:
Introduce a
KokkosComm::HandleThis hides the CommunicationSpace details from the KokkosComm interface. It's specialized on each CommunicationSpace. For MPI, this holds a Kokkos execution space instance and an MPI communicator.
Reduce main interface to 3 functions:
Move MPI-specific wrappers to
mpinamespaceThis is so we can still have useful MPI interop stuff but it's not part of the "blessed" interface
Update unit tests and perf tests for new layout
Many unit-tests are now MPI specific. All perf_tests are only built if MPI is built (this could be revisited when other Transports are done).
Move
CommModeand related interfaces tompinamespaceThis is MPI-specific stuff. I also made the CommMode stuff more canonically C++
ReadyCommMode -> CommModeReadyAdd some additional support for non-contiguous views along existing lines
A
Req<Mpi>knows how to keep track of some lambda functions it can call after MPI_Wait (e.g. to unpack the result of an MPI_IRecv).Grab-bag
req.wait->KokkosComm::wait(req);wait_anyandwait_all.