-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Note
The below description reflects an earlier version of the MPI Continuations proposal and is kept for historical purposes. The current version of the proposal can be found in https://github.com/mpiwg-hybrid/mpi-standard/pull/1 and in the following PDF:
https://github.com/mpiwg-hybrid/mpi-standard/files/14565813/continuations_202403011.pdf
Background
MPI provides support for all sorts of non-blocking operations (pt2pt, collectives, RMA, I/O), each returning a request object that can be used to test and wait for the completion of the operation. Once an operation is complete, applications typically react to that change in state, e.g., by deallocating the use buffer, processing the received message, or starting subsequent operations. The required polling on the requests is impractical for applications that are able to overlap communication with additional work, such as processing available tasks. Request management may become cumbersome and error-prone esp in multi-threaded applications.
Proposal
This proposal introduces a flexible interface for attaching so-called continuations to operation requests. Continuations are actions that are invoked by the MPI library once the completion of an operation is detected. A maximum of one continuation may be attached to any request object and the MPI implementtion takes back the ownership of any non-persistent request and no copy of the request may be used to test/wait for the completion of the operation. Persistent requests remain valid but may be used to test/wait for the operation to complete after the continuation has been attached but no second continuation may be attached to it before its completion. It is unspecified whether the continuation has completed execution when a call to MPI_Test/MPI_Wait on a persistent operation request. Execution of the continuation may be deferred to a later point.
Continuations may be attached to a single operation request (MPI_Continue) or a set of requests (MPI_Continueall):
typedef void MPI_Continue_cb_function(MPI_Status *array_of_statuses, void *cb_data);
int MPI_Continue(
MPI_Request *op_request,
MPI_Continue_cb_function cb,
void *cb_data,
MPI_Status *status,
MPI_Request cont_request);
int MPI_Continueall(
int count,
MPI_Request array_of_op_requests[],
MPI_Continue_cb_function cb,
void *cb_data,
MPI_Status array_of_statuses[],
MPI_Request cont_request)The latter will cause the continuation to be invoked once all of the provided operations have completed. For each operation request, a status may be provided that will be set before the continuation is invoked. The provided buffer containing the status(es) will be passed to the continuation callback, along with the provided cb_data pointer. MPI_STATUS_IGNORE/MPI_STATUSES_IGNORE may be passed instead to the registration function, which would then be passed to the callback instead.
Continuation Requests
The continuation is attached to the operation request(s) and registered to the continuation request (cont_request above). Continuation requests are allocated using MPI_Continue_init:
int MPI_Continue_init(MPI_Info info, MPI_Request *cont_req);Continuation requests accumulate outstanding continuations and can be used to test/wait for their completion. Continuation request may themselves have a continuation attached to them, which will be invoked once all registered continuations have completed executing. They can also be used to progress outstanding continuations by calling MPI_Test on them.
Continuation request are persistent but are not started explicitly. Instead, continuation requests are started implicitly when the first continuation is registered after initialization or previous completion.
Execution Context
By default, continuations may be invoked by any application thread calling into the MPI library. Two info keys for calls to MPI_Continue_init are provided to restrict the execution:
"mpi_continue_poll_only": if set to"true"continuations are only invoked whenMPI_TestorMPI_Waitis called on the continuation request with which the continuations are registered. (default:"false", i.e., the continuation may be executed at any time)"mpi_continue_thread": may be"application"(only application threads may execute continuations) or"any"(any thread may execute continuations, incl. MPI progress threads, if availabe). (default:"application")
Further Info Keys
"mpi_continue_enqueue_complete": if"true"and upon attaching a continuation to a set of requests all operation are complete, the continuation is enqueued for later execution (e.g., while polling for on the continuation request). Otherwise, continuations may be executed immediately inside the call toMPI_Continue/MPI_Continueallif all operations were immediately complete. (default:"false")"mpi_continue_max_poll": the maximum number of continuations to execute when polling (callingMPI_Test) on the continuation request. (default:"-1", i.e., as many as possible)"mpi_continue_async_signal_safe": if true, the continuation is async-signal-safe and may be called from within a signal handler. (default:"false")
Resources
The current PDF:
mpi40-report-continuations.pdf
Proposal PR: TBD
Open Questions
A list of open questions (to be used to track discussions):
Integration with Sessions
- How to connect continuation requests to a session? Do we need
MPI_Session_continue_init? - Is it legal to register continuations for requests from different sessions with the same continuation request?
Status handling
- Can we use the same callback signature for
MPI_ContinueandMPI_Continueall, given that one would be passedMPI_STATUS_IGNOREand the otherMPI_STATUSES_IGNORE?
General
- Should we pass the number of completed requests to the callback function?
- Is there any use for something like
MPI_Continueany(potentially more resource efficient by reusing the same data structure for several continuations) orMPI_Continuesome(what would the semantics be?) - Should the execution of a continuation be required for the completion of a persistent request?