Skip to content

Commit caf7824

Browse files
committed
dpm: don't use locality info for multi PMIX namespace environments
Some of our collective frameworks are now locality aware and make use of this information to make decisions on how to handle app coll ops. It turns out that in certain multi-namespace situations (jobid in ompi speak), some procs can get locality info about other procs but not in a symmetric fashion using PMIx mechanisms. This can lead to communicators with different locality information on different procs. This can lead to deadlock when using certain collectives. This situation can be seen with the ompi-tests/ibm/dynamic/intercomm_merge.c In this test the following happens: 1. process set A is started with mpirun 2. process set A spawns a set of processes B 3. processes in sets A and B create an intra comm using the intercomm from MPI_Comm_spawn and MPI_Comm_get_parent in the spawners and spawnees respectively 4. process in set A spawns a set of processes C 5. processes in sets A and C create an intra comm using the intercomm from MPI_Comm_spawn and MPI_Comm_get_parent in the spawners and spawnees respectively 6. processes in A and B create new intercomm 7. processes in A and C create new intercomm 8. processes in A, B, anc C create a new intra comm using the intercomms from steps 6 and 7 9. processes in A, B, and C try to do an MPI_Barrier using the intra comm from step 8 It turns out in step 8 the locality info supplied by pmix is asymmetric. Processes in sets B and C aren't able to deterimine locality info from each other (PMIx returns not found when attempts are made to get locality info for the remote processes). This causes issues when the step 9 is executed. Processes in set A are trying to use the tuned collective component for the barrier. Processes in sets B and C are trying to use the HAN collective component for the barrier. In process sets B and C, HAN thinks that the communicator has both local and remote procs so tries to use a hierarchical algorithm. Meanwhile, procs in set A can retrieve locality from all procs in sets B and C and think the collective is occuring on a single node - which in fact it is. This behavior can be observed using prrte master at 8ecee645de and openpmix master at a083d8f9. This patch restricts using locality info for a proc if its in a different PMIx namespace. It also removes some comments which are now no longer accurate. Signed-off-by: Howard Pritchard <[email protected]>
1 parent ecd206d commit caf7824

File tree

3 files changed

+10
-6
lines changed

3 files changed

+10
-6
lines changed

3rd-party/openpmix

Submodule openpmix updated 203 files

3rd-party/prrte

Submodule prrte updated 278 files

ompi/dpm/dpm.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -436,14 +436,18 @@ int ompi_dpm_connect_accept(ompi_communicator_t *comm, int root,
436436
opal_list_remove_item(&ilist, (opal_list_item_t*)cd); // TODO: do we need to release cd ?
437437
OBJ_RELEASE(cd);
438438
/* ompi_proc_complete_init_single() initializes and optionally retrieves
439-
* OPAL_PMIX_LOCALITY and OPAL_PMIX_HOSTNAME. since we can live without
440-
* them, we are just fine */
439+
* OPAL_PMIX_LOCALITY and OPAL_PMIX_HOSTNAME.
440+
*/
441441
ompi_proc_complete_init_single(proc);
442442
/* if this proc is local, then get its locality */
443443
if (NULL != local_ranks_in_jobid) {
444-
uint16_t u16;
444+
uint16_t u16 = 0;
445445
for (prn=0; prn < nprn; prn++) {
446-
if (local_ranks_in_jobid[prn] == proc->super.proc_name.vpid) {
446+
/*
447+
* exclude procs not in our job id (aka pmix namespace) from localization optimizations
448+
*/
449+
if ((local_ranks_in_jobid[prn] == proc->super.proc_name.vpid)
450+
&& (OMPI_PROC_MY_NAME->jobid == proc->super.proc_name.jobid)) {
447451
/* get their locality string */
448452
val = NULL;
449453
OPAL_MODEX_RECV_VALUE_IMMEDIATE(rc, PMIX_LOCALITY_STRING,

0 commit comments

Comments
 (0)