Skip to content

shared memory segments not cleaned up by vader btl after program aborted #6322

@leofang

Description

@leofang

Background information

What version of Open MPI are you using?

v3.1.2, v3.1.3

Describe how Open MPI was installed

We have an internally managed Conda environment and we build our own Conda packages, include openmpi. (I think it was built from the tarball downloaded on the Open MPI website.) Then, it was installed using conda install openmpi.

Please describe the system on which you are running

  • Operating system/version: Debian GNU/Linux 8 (jessie)
  • Computer hardware: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz + 512 GB RAM + 4 Nvidia V100 GPUs
  • Network type: tcp/ip

Details of the problem

If one spawns a few MPI processes, let them do some work, but terminate them abnormally (ctrl-C and whatnot), it can be seen that in /dev/shm/ there will be shared memory segments related to the vader component that are not unlinked by Open MPI during the cleanup phase:

leofang@xf03id-srv5:~$ mpirun -n 4 python do_very_long_calculation_in_mpi.py
...output not important...
^C
leofang@xf03id-srv5:~$ ls -lt /dev/shm | more
total 28464
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.2
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.3
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.1
-rw------- 1 leofang leofang 4194312 Jan 25 23:06 vader_segment.xf03id-srv5.73dc0001.0

I know that we didn't have this issue with v3.1.1 and I'll test v3.1.3 later (UPDATE: 3.1.3 also has this problem). For now I just need to know if this is a known bug with v3.1.2 so that I can avoid this version in our Conda settings. Thanks!

UPDATE: a minimal working example in Python is provided below

import time
import mpi4py # I used v3.0.0
mpi4py.rc.initialize = False # avoid auto initialization
from mpi4py import MPI

MPI.Init_thread() # manually initialize
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print("I'm rank", rank, "from a pool of size", size, ", begin sleeping...")
time.sleep(30)
print("rank", rank, "awakes!")
MPI.Finalize()

and terminate it as mentioned above during the 30s sleeping. Note that if -n 1 is used, no residual segment would be left in /dev/shm even with abnormal abort. I double checked that in the 3.1.x series this problem only happens for 3.1.2 and 3.1.3.

UPDATE2: identical MWE in C

#include <mpi.h>
#include <stdio.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
  int provided, size, rank, len;
  char name[MPI_MAX_PROCESSOR_NAME];

  MPI_Init(&argc, &argv);

  MPI_Comm_size(MPI_COMM_WORLD, &size);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  printf("I am process %d of %d. Start sleeping...\n", rank, size);
  sleep(30);
  printf("rank %d awakes!\n", rank);

  MPI_Finalize();
  return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions