Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
39aae5e
Implementation of the current state of MPI Continuations proposal
devreal Oct 5, 2021
0240721
Move parts of ompi_request_cont_data_t into custom request struct
devreal Oct 18, 2021
c4ad16f
Rename functions to avoid confusion
devreal Oct 18, 2021
4da0ec0
SQUASHME: Replace opal_show_help instead of fprintf
devreal Oct 18, 2021
321fad0
Use OMPI_COPY_STATUS to set status
devreal Oct 18, 2021
acb7580
Add documentation for ompi_continue_register_request_progress
devreal Oct 18, 2021
d77ca1c
Remove re-iteration of continuation requests in test_any
devreal Oct 21, 2021
4433165
Enable the continuations extension only if explicitly requested
devreal Oct 21, 2021
ecbb8c6
Remove lock out of critical path of creating/completing continuations
devreal Oct 25, 2021
8abe765
Fix printing of tear-down warning
devreal Nov 8, 2021
8c78d1d
Fix logic error when creating a new continuation
devreal Nov 8, 2021
2d0812e
Use new OMPI_MPIEXT_continue_POST_CONFIG hook
devreal Nov 8, 2021
d66be08
Request test/wait: prevent continuation requests from being set to in…
devreal Nov 8, 2021
175c224
Don't enqueue continuations twice for execution if everything is comp…
devreal Nov 8, 2021
9515ffa
Poll-only continuations should only be executed by the thread testing…
devreal Nov 9, 2021
09afd05
Allow the waiting thread to execute continuations if not blocked in t…
devreal Nov 9, 2021
6afa21b
Make sure threads waiting on continuation request execute continuatio…
devreal Jan 21, 2022
5c417cf
Add tests for MPI continuations (single- and multi-threaded)
devreal Apr 19, 2022
55ea6b3
Don't execute callbacks immediately if we didn't see any completed re…
devreal Jun 29, 2022
d95228e
Continuations: updated API and support for error handling
devreal Jun 9, 2022
601e89c
Continuations: Add callback return value
devreal Jun 9, 2022
181b56d
Move ompi_mpi_object_t to separate header file
devreal Jun 10, 2022
dea0fa7
Error handler: query stored MPI object to invoke error on
devreal Jun 10, 2022
24dd50b
Minor fixes: break include cycle and remove unnecesary atomic
devreal Jun 18, 2022
b3e941c
Atomics: add relaxed load and store
devreal Jun 18, 2022
d7a7eee
Use OPAL_ATOMIC_RELAXED_STORE in continuations
devreal Jun 18, 2022
43c54c3
Add explicit start to continuation requests
devreal Jun 21, 2022
e5db72f
Remove locks if request buffer is volatile
devreal Jun 23, 2022
59ec7dd
Add missing MPIX_Continue_get_failed function implementation
devreal Jun 23, 2022
a78a4ee
Fix rebase conflict
devreal Jun 30, 2022
a82b0b0
Fix starting and completion of CRs and reduce atomic operations
devreal Jul 1, 2022
f62b2b4
Revert "Atomics: add relaxed load and store"
devreal Jul 1, 2022
04cf179
Remove use of OPAL_ATOMIC_RELAXED_STORE
devreal Jul 11, 2022
4fe6369
Make sure CRs are restarted in tests
devreal Jul 11, 2022
b7c6811
Don't execute a continuation immediately if the CR is inactive
devreal Jul 11, 2022
33264ed
Avoid using thread_local data if multi-threading is not enabled
devreal Jul 11, 2022
a148e7b
Use atomic locks instead of pthread locks to avoid library calls
devreal Jul 11, 2022
2123735
Make sure the progress callback is always registered
devreal Jul 12, 2022
937e46d
Fix logic errors for continuation flags
devreal Jul 12, 2022
fc7eb16
Only set in_progress around code that directly executes continuations
devreal Jul 12, 2022
be38ca6
Always return 1 from request_completion_cb
devreal Jul 12, 2022
3f87912
Reduce atomic operations by checking for empty list
devreal Jul 12, 2022
e9caa09
Reorder progress of individual requests
devreal Jul 12, 2022
57d9c50
Avoid atomic operation if all requests have completed
devreal Jul 12, 2022
d1a4206
Let the test and wait calls set CRs to inactive
devreal Jul 12, 2022
9e8505c
Progress CR in ompi_request_wait_completion before creating a sync
devreal Jul 12, 2022
e26a9ba
Continuation tests: test for delayed start
devreal Jul 12, 2022
8e3be49
Infrastructure to test for MPI object error handler, not used yet
devreal Jul 12, 2022
488909e
Remove an atomic op from wait_sync_update
devreal Jul 13, 2022
0bb7352
Outline constructor for thread-local data
devreal Jul 19, 2022
584c8b0
Minor changes to the continuation API to reflect current proposal
devreal Nov 3, 2022
27736e0
Remove left over use of OPAL_ATOMIC_RELAXED_STORE
devreal Nov 29, 2022
4e71e1b
Make sure completed continuations requests are marked pending
devreal Dec 12, 2022
0984ecd
Don't set persistent requests to MPI_REQUEST_NULL
devreal Dec 16, 2022
d90b4e1
Fix wrong locking when freeing a continuation request
devreal Dec 19, 2022
1b042c5
Continuations: Fix potential race condition in debug mode
devreal Jan 31, 2023
a8ef1ce
Hotfix for lost callbacks
devreal Mar 2, 2023
d88917e
Properly defer continuations if the cont_req is inactive
devreal Aug 21, 2023
e605570
Slight code cleanup and remove locks if threads are not enabled
devreal Aug 21, 2023
7cf2ffd
Docs: don't mention MPI_UNDEFINED
devreal Aug 21, 2023
7066d62
Fixup documentation and remove mention of MPIX_CONT_PERSISTENT
devreal Aug 21, 2023
ffe1341
Continuations: Cleanup dox and implement invoke-failed
devreal Sep 7, 2023
d6e372b
Put a continuation back into the cont_incomplete_list if rechecking
devreal Feb 6, 2024
b5596a8
Add write memory barrier before invoking request callback
devreal Apr 2, 2024
63b25f4
Add REQUEST_CB_PENDING that got lost during rebase
devreal Apr 2, 2024
3f33c7d
Fix compiler errors in continuation.c
devreal Apr 2, 2024
68b887c
Don't cast function pointers to void*
devreal Apr 2, 2024
dcb9df5
pml/ucx: Properly reinitialize requests
devreal Aug 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions ompi/communicator/ft/comm_ft_revoke.c
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,10 @@
#include "ompi/communicator/communicator.h"
#include "ompi/mca/pml/pml.h"

#if OMPI_HAVE_MPI_EXT_CONTINUE
#include "ompi/mpiext/continue/c/continuation.h"
#endif /* OMPI_HAVE_MPI_EXT_CONTINUE */

static int ompi_comm_revoke_local(ompi_communicator_t* comm,
ompi_comm_rbcast_message_t* msg);

Expand Down Expand Up @@ -93,6 +97,14 @@ static int ompi_comm_revoke_local(ompi_communicator_t* comm, ompi_comm_rbcast_me
MCA_PML_CALL(revoke_comm(comm, false));
/* Signal the point-to-point stack to recheck requests */
wait_sync_global_wakeup(MPI_ERR_REVOKED);

#ifdef OMPI_HAVE_MPI_EXT_CONTINUE
/* Continuations:
* Release continuations and mark them as failed.
*/
ompi_continue_global_wakeup(MPI_ERR_PROC_FAILED);
#endif // OMPI_HAVE_MPI_EXT_CONTINUE

return true;
}

11 changes: 11 additions & 0 deletions ompi/errhandler/errhandler.c
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,10 @@
#include "opal/mca/backtrace/backtrace.h"
#include "ompi/runtime/mpiruntime.h"

#if OMPI_HAVE_MPI_EXT_CONTINUE
#include "ompi/mpiext/continue/c/continuation.h"
#endif /* OMPI_HAVE_MPI_EXT_CONTINUE */

/*
* Table for Fortran <-> C errhandler handle conversion
*/
Expand Down Expand Up @@ -415,6 +419,13 @@ int ompi_errhandler_proc_failed_internal(ompi_proc_t* ompi_proc, int status, boo
*/
wait_sync_global_wakeup(PMIX_ERR_PROC_ABORTED == status? MPI_ERR_PROC_ABORTED: MPI_ERR_PROC_FAILED);

#ifdef OMPI_HAVE_MPI_EXT_CONTINUE
/* Continuations:
* Release continuations and mark them as failed.
*/
ompi_continue_global_wakeup(MPI_ERR_PROC_FAILED);
#endif // OMPI_HAVE_MPI_EXT_CONTINUE

/* Collectives:
* Propagate the error (this has been selected rather than the "roll
* forward through errors in collectives" as this is less intrusive to the
Expand Down
10 changes: 10 additions & 0 deletions ompi/errhandler/errhandler_invoke.c
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@
#include "ompi/errhandler/errhandler.h"
#include "ompi/mpi/fortran/base/fint_2_int.h"

#if OMPI_HAVE_MPI_EXT_CONTINUE
#include "ompi/mpiext/continue/c/continuation.h"
#endif /* OMPI_HAVE_MPI_EXT_CONTINUE */

int ompi_errhandler_invoke(ompi_errhandler_t *errhandler, void *mpi_object,
int object_type, int err_code, const char *message)
Expand Down Expand Up @@ -174,6 +177,13 @@ int ompi_errhandler_request_invoke(int count,
mpi_object = requests[i]->req_mpi_object;
type = requests[i]->req_type;

#if OMPI_HAVE_MPI_EXT_CONTINUE
if (OMPI_REQUEST_CONT == type) {
/* take the mpi object stored in the continuation request */
ompi_continue_get_error_info(requests[i], &mpi_object, &type);
}
#endif // OMPI_HAVE_MPI_EXT_CONTINUE

/* Since errors on requests cause them to not be freed (until we
can examine them here), go through and free all requests with
errors. We only invoke the error on the *first* request
Expand Down
2 changes: 2 additions & 0 deletions ompi/include/mpi.h.in
Original file line number Diff line number Diff line change
Expand Up @@ -754,6 +754,8 @@ enum {
#define MPI_ERR_SESSION 78
#define MPI_ERR_VALUE_TOO_LARGE 79

#define MPI_ERR_CONT 78

/* Per MPI-3 p349 47, MPI_ERR_LASTCODE must be >= the last predefined
MPI_ERR_<foo> code. Set the last code to allow some room for adding
error codes without breaking ABI. */
Expand Down
4 changes: 2 additions & 2 deletions ompi/mca/pml/ob1/pml_ob1_recvfrag.c
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ int mca_pml_ob1_revoke_comm( struct ompi_communicator_t* ompi_comm, bool coll_on
/* note this is not an ompi_proc, but a ob1_comm_proc, thus we don't
* use ompi_proc_is_sentinel to verify if initialized. */
if( NULL == proc ) continue;
/* remove the frag from the unexpected list, add to the nack list
/* remove the frag from the unexpected list, add to the nack list
* so that we can send the nack as needed to remote cancel the send
* from outside the match lock.
*/
Expand All @@ -384,7 +384,7 @@ int mca_pml_ob1_revoke_comm( struct ompi_communicator_t* ompi_comm, bool coll_on
}
}
/* same for the cantmatch queue/heap; this list is more complicated
* Keep it simple: we pop all of the complex list, put the bad items
* Keep it simple: we pop all of the complex list, put the bad items
* in the nack_list, and keep the good items in the keep_list;
* then we reinsert the good items in the cantmatch heaplist */
mca_pml_ob1_recv_frag_t* frag;
Expand Down
1 change: 1 addition & 0 deletions ompi/mca/pml/ucx/pml_ucx_request.h
Original file line number Diff line number Diff line change
Expand Up @@ -157,6 +157,7 @@ int mca_pml_ucx_request_cancel_send(ompi_request_t *req, int flag);

static inline void mca_pml_ucx_request_reset(ompi_request_t *req)
{
OMPI_REQUEST_INIT(req, req->req_persistent);
req->req_complete = REQUEST_PENDING;
}

Expand Down
4 changes: 2 additions & 2 deletions ompi/mpi/c/start.c
Original file line number Diff line number Diff line change
Expand Up @@ -78,8 +78,8 @@ int MPI_Start(MPI_Request *request)
case OMPI_REQUEST_PML:
case OMPI_REQUEST_COLL:
case OMPI_REQUEST_PART:
if ( MPI_PARAM_CHECK && !((*request)->req_persistent &&
OMPI_REQUEST_INACTIVE == (*request)->req_state)) {
case OMPI_REQUEST_CONT:
if ( MPI_PARAM_CHECK && !(*request)->req_persistent) {
return OMPI_ERRHANDLER_NOHANDLE_INVOKE(MPI_ERR_REQUEST, FUNC_NAME);
}

Expand Down
1 change: 1 addition & 0 deletions ompi/mpi/c/startall.c
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@ int MPI_Startall(int count, MPI_Request requests[])
(OMPI_REQUEST_PML != requests[i]->req_type &&
OMPI_REQUEST_COLL != requests[i]->req_type &&
OMPI_REQUEST_PART != requests[i]->req_type &&
OMPI_REQUEST_CONT != requests[i]->req_type &&
OMPI_REQUEST_NOOP != requests[i]->req_type)) {
rc = MPI_ERR_REQUEST;
break;
Expand Down
22 changes: 22 additions & 0 deletions ompi/mpiext/continue/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# -*- shell-script -*-
#
# Copyright (c) 2021 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

# This Makefile is not traversed during a normal "make all" in an OMPI
# build. It *is* traversed during "make dist", however. So you can
# put EXTRA_DIST targets in here.
#
# You can also use this as a convenience for building this MPI
# extension (i.e., "make all" in this directory to invoke "make all"
# in all the subdirectories).

SUBDIRS = c

43 changes: 43 additions & 0 deletions ompi/mpiext/continue/c/Makefile.am
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#
# Copyright (c) 2021 The University of Tennessee and The University
# of Tennessee Research Foundation. All rights
# reserved.
# $COPYRIGHT$
#
# Additional copyrights may follow
#
# $HEADER$
#

# OMPI_BUILD_MPI_PROFILING is enabled when we want our generated MPI_* symbols
# to be replaced by PMPI_*.
# In this directory, we need it to be 0

AM_CPPFLAGS = -DOMPI_BUILD_MPI_PROFILING=0 -DOMPI_COMPILING_FORTRAN_WRAPPERS=0

include $(top_srcdir)/Makefile.ompi-rules

noinst_LTLIBRARIES = libmpiext_continue_c.la

# This is where the top-level header file (that is included in
# <mpi-ext.h>) must be installed.
ompidir = $(ompiincludedir)/mpiext

# This is the header file that is installed.
nodist_ompi_HEADERS = mpiext_continue_c.h

libmpiext_continue_c_la_SOURCES = \
continuation.c \
continue.c \
continueall.c \
continue_init.c \
mpiext_continue_module.c

#libmpiext_continue_c_la_LDFLAGS = -module -avoid-version

dist_ompidata_DATA = help-mpi-continue.txt

ompi_HEADERS = $(headers)

MAINTAINERCLEANFILES = $(nodist_libmpiext_continue_c_la_SOURCES)

Loading