-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
5.1.0 (but also tested on 5.0.5)
> git log --pretty=format:'%H' -n 1
f31a9be313a0a6be7db2117050b1b608c115c4d6
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
git clone https://github.com/open-mpi/ompi.git
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
e32e017 3rd-party/openpmix (v1.1.3-4036-ge32e0179)
8d0a25b8c9b868bc66de8df3b8abd7f477f07a0f 3rd-party/prrte (psrvr-v2.0.0rc1-4799-g8d0a25b8c9)
dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)
Please describe the system on which you are running
- Operating system/version: Red Hat Enterprise Server; Linux 5.14.x (EL9-based distribution)
- Computer hardware: Intel Sapphire Rapids 8480
- Network type: NDR200; so pml UCX, but also reproducible with OB1 pml
Details of the problem
In OpenMPI5, children spawned by MPI_Comm_spawn are terminated once the last process belonging to the parent communicator (original MPI_COMM_WORLD) returns. This is not the behavior in OpenMPI4, where the children can live on past this point.
I am wondering if this is intended behavior, and if there are any way to configure this not to happen. Here's a code snippet to reproduce:
#include <mpi.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#define ROOT 0
#define MAX_PROCS 2
#define EXTRA_CHILD_TIME 10
int main(int argc, char *argv[])
{
MPI_Init(&argc, &argv);
char const child[] = "child";
char const parent[] = "parent";
MPI_Comm inter_comm;
MPI_Comm parent_comm;
int error_codes[MAX_PROCS];
int i_am_parent;
MPI_Comm_get_parent(&parent_comm);
int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
if (parent_comm == MPI_COMM_NULL)
{
i_am_parent = 1;
MPI_Comm_spawn(argv[0], MPI_ARGV_NULL, MAX_PROCS, MPI_INFO_NULL, ROOT, MPI_COMM_WORLD, &inter_comm, error_codes);
MPI_Comm_free(&inter_comm);
if (my_rank == ROOT)
{
printf("Root parent terminating...\n");
}
else
{
sleep(5);
printf("Other parent exiting\n");
}
}
else
{
i_am_parent = 0;
MPI_Comm_free(&parent_comm);
for (int i = 0; i < EXTRA_CHILD_TIME; i++)
{
if (my_rank == ROOT)
{
printf("Child sleeping for %d more seconds...\n", EXTRA_CHILD_TIME - i);
}
sleep(1);
}
if (my_rank == ROOT)
{
printf("Child terminating...\n");
}
}
MPI_Finalize();
printf("Rank %d (%s) passed MPI finalize\n", my_rank, i_am_parent ? parent : child);In OpenMPI 4.1.5, the output is along the lines of
Child sleeping for 10 more seconds...
Root parent terminating...
Child sleeping for 9 more seconds...
Child sleeping for 8 more seconds...
Child sleeping for 7 more seconds...
Child sleeping for 6 more seconds...
Other parent exiting
Child sleeping for 5 more seconds...
Rank 0 (parent) passed MPI finalize
Rank 1 (parent) passed MPI finalize
Child sleeping for 4 more seconds...
Child sleeping for 3 more seconds...
Child sleeping for 2 more seconds...
Child sleeping for 1 more seconds...
Child terminating...
Rank 1 (child) passed MPI finalize
Rank 0 (child) passed MPI finalize
However, in OpenMPI 5.0.5, 5.1.0, the program fully stops after the parents terminate, without any error message or sign of abnormal termination.
Root parent terminating...
Child sleeping for 10 more seconds...
Child sleeping for 9 more seconds...
Child sleeping for 8 more seconds...
Child sleeping for 7 more seconds...
Child sleeping for 6 more seconds...
Other parent exiting
Child sleeping for 5 more seconds...
Rank 0 (parent) passed MPI finalize
Rank 1 (parent) passed MPI finalize
I ran this with
mpirun -np 2 ./only-child-procs
and then
mpirun --mca pml ob1 --mca btl ^uct -np 2 ./only-child-procs
And the results are the same in both for both versions.
I figured out that keeping one of the original processes 'hostage' for the lifetime of the children sidesteps the issue and produces the expected output in OMPI5;
... //same code initially
MPI_Finalize();
printf("Rank %d (%s) passed MPI finalize\n", my_rank, i_am_parent ? parent : child);
if (i_am_parent && my_rank == 0)
{
int file_exists;
while (1)
{
file_exists = system("[ -f .iamdone ]");
if (file_exists == 0)
{
system("rm .iamdone");
break;
}
sleep(1);
}
}
else if (!i_am_parent)
{
system("touch .iamdone");
}Obviously it is not an ideal solution, but it might give you some insight as to what is going on.