Stream-ordered wait{_any,_all}, fix wait_any implementation#125
Stream-ordered wait{_any,_all}, fix wait_any implementation#125cwpearson wants to merge 5 commits intokokkos:developfrom
Conversation
0066522 to
66a68e1
Compare
src/KokkosComm/mpi/req.hpp
Outdated
|
|
||
| template <KokkosExecutionSpace ExecSpace> | ||
| void wait_all(const ExecSpace &space, std::vector<Req<Mpi>> &reqs) { | ||
| space.fence(); |
There was a problem hiding this comment.
I wonder whether this fence is necessary. If the semantics of KokkosExecutionSpace is such that enqueued operations are ordered then it is sufficient to block the enqueuing thread until the communication is complete. Subsequent submissions are necessarily ordered with the (then-completed) MPI operations.
There was a problem hiding this comment.
I think we do not need that fence there. There are two more occurrences of superfluous fences.
There was a problem hiding this comment.
Consider moving the wait_* functions to a separate file.
There was a problem hiding this comment.
Semantically we're trying to order this "wait" after anything else already in the execution space instance. There's no way to actually enqueue an arbitrary host function in a Kokkos execution space instance, so the only way to ensure everything already in the instance is done is to fence it. Then we can call the MPI functions to complete the communication.
There was a problem hiding this comment.
The difference is not observable. The operations have been started so they have seen the execution space at the time the operation was started. Subsequent kernels should not interfere with these operations, nor do operations submitted after the wait (because they will themselves synchronize the execution space before starting the operation). And I don't think wait_all should guarantee synchronization of the space because that is more than strictly needed.
66a68e1 to
4bc3385
Compare
|
Some of this was addressed in #216. Currently, the semantics of |
This adds
wait,wait_any,wait_allfunctions likeSemantically, this inserts the "wait" into the provided execution space instance - i.e., existing work in the instance is allowed to complete, and future work must wait for the communication to complete.
The 1-argument version without an execution space just calls the two-argument version with
Kokkos::DefaultExecutionSpace.