-
Notifications
You must be signed in to change notification settings - Fork 929
Description
Background information
What version of Open MPI are you using?
v3.1.2, v3.1.3
Describe how Open MPI was installed
We have an internally managed Conda environment and we build our own Conda packages, include openmpi. (I think it was built from the tarball downloaded on the Open MPI website.) Then, it was installed using conda install openmpi.
Please describe the system on which you are running
- Operating system/version: Debian GNU/Linux 8 (jessie)
- Computer hardware: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz + 512 GB RAM + 4 Nvidia V100 GPUs
- Network type: tcp/ip
Details of the problem
If one spawns a few MPI processes, let them do some work, but terminate them abnormally (ctrl-C and whatnot), it can be seen that in /dev/shm/ there will be shared memory segments related to the vader component that are not unlinked by Open MPI during the cleanup phase:
leofang@xf03id-srv5:~$ mpirun -n 4 python do_very_long_calculation_in_mpi.py
...output not important...
^C
leofang@xf03id-srv5:~$ ls -lt /dev/shm | more
total 28464
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.2
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.3
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.1
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.0I know that we didn't have this issue with v3.1.1 and I'll test v3.1.3 later (UPDATE: 3.1.3 also has this problem). For now I just need to know if this is a known bug with v3.1.2 so that I can avoid this version in our Conda settings. Thanks!
UPDATE: a minimal working example in Python is provided below
import time
import mpi4py # I used v3.0.0
mpi4py.rc.initialize = False # avoid auto initialization
from mpi4py import MPI
MPI.Init_thread() # manually initialize
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print("I'm rank", rank, "from a pool of size", size, ", begin sleeping...")
time.sleep(30)
print("rank", rank, "awakes!")
MPI.Finalize()and terminate it as mentioned above during the 30s sleeping. Note that if -n 1 is used, no residual segment would be left in /dev/shm even with abnormal abort. I double checked that in the 3.1.x series this problem only happens for 3.1.2 and 3.1.3.
UPDATE2: identical MWE in C
#include <mpi.h>
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int provided, size, rank, len;
char name[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
printf("I am process %d of %d. Start sleeping...\n", rank, size);
sleep(30);
printf("rank %d awakes!\n", rank);
MPI_Finalize();
return 0;
}