You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reduce the overhead of UCX add_procs with intercommunicators
* When creating a large number of intercommunicators with `MPI_Intercomm_create`
the UCX pml add_procs routine is called for each "new" process. This results
in a call to `ucp_ep_create` and overwrites the old endpoint at the PML level
if there was already one in place. However, it adds a new endpoint to the UCX
instance below without removing the old endpoint. This results in accumulating
a large number of endpoints paired with the UCX worker. Creating the endpoints
has overhead which contributes to the slowdown for the `MPI_Intercomm_create`
function.
* On Finalize cleaning these up endpoints occurs in the `ucp_worker_destroy`
function. Since there are a signifiant number of endpoints it takes quite a
while to cleanup.
* In this patch, we first check to see if an endpoint has already been created
for this process. If so then we skip adding it again. Otherwise we create a
new endpoint.
Signed-off-by: Joshua Hursey <[email protected]>
0 commit comments