Skip to content

OpenMPI doesn't work when docker is running #1

@G-Ragghianti

Description

@G-Ragghianti

Problem: When a docker container is running, simple OpenMPI jobs cannot run using the tcp interface. For example, a broadcast test will hang.

Steps to reproduce:

$ spack install osu-micro-benchmarks ^openmpi~rsh fabric=ucx
$ spack load osu-micro-benchmarks
$ mpirun -n 2 osu_bcast

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       4.32
2                       4.32
4                       4.36
8                       4.30
16                      4.30
32                      4.32
64                      4.33
128                     4.10
256                     4.30
512                     5.72
1024                    5.81
2048                    6.07
4096                    5.74
8192                    6.67
16384                   7.74
32768                  13.65
<hangs>

Expected result:

$mpirun -n 2 --mca oob_base_verbose 100 osu_bcast

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       3.26
2                       4.05
4                       4.40
8                       7.55
16                      5.53
32                      5.53
64                      4.06
128                     4.49
256                     6.37
512                     7.11
1024                    5.92
2048                    7.26
4096                    6.74
8192                    8.74
16384                  10.93
32768                  14.40
65536                  33.09
131072                 48.18
262144                 70.30
524288                118.22
1048576               200.32

Verbose output:

[histamine0:1785348] mca: base: components_register: registering framework oob components
[histamine0:1785348] mca: base: components_register: found loaded component tcp
[histamine0:1785348] mca: base: components_register: component tcp register function successful
[histamine0:1785348] mca: base: components_open: opening oob components
[histamine0:1785348] mca: base: components_open: found loaded component tcp
[histamine0:1785348] mca: base: components_open: component tcp open function successful
[histamine0:1785348] mca:oob:select: checking available component tcp
[histamine0:1785348] mca:oob:select: Querying component [tcp]
[histamine0:1785348] oob:tcp: component_available called
[histamine0:1785348] WORKING INTERFACE 1 KERNEL INDEX 1 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init rejecting loopback interface lo
[histamine0:1785348] WORKING INTERFACE 2 KERNEL INDEX 2 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init adding 10.0.0.49 to our list of V4 connections
[histamine0:1785348] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4
[histamine0:1785348] [[3819,0],0] oob:tcp:init adding 172.17.0.1 to our list of V4 connections
[histamine0:1785348] [[3819,0],0] TCP STARTUP
[histamine0:1785348] [[3819,0],0] attempting to bind to IPv4 port 0
[histamine0:1785348] [[3819,0],0] assigned IPv4 port 36725
[histamine0:1785348] mca:oob:select: Adding component to end
[histamine0:1785348] mca:oob:select: Found 1 active transports
[histamine0:1785348] [[3819,0],0]: get transports
[histamine0:1785348] [[3819,0],0]:get transports for component tcp

# OSU MPI Broadcast Latency Test v7.1
# Datatype: MPI_CHAR.
# Size       Avg Latency(us)
1                       4.45
2                       4.61
4                       4.66
8                       4.63
16                      4.02
32                      4.06
64                      4.07
128                     4.10
256                     4.13
512                     5.82
1024                    5.92
2048                    6.27
4096                    5.98
8192                    6.69
16384                   7.57
32768                  14.08
<hangs>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions