-
Couldn't load subscription status.
- Fork 928
Open
Labels
Description
Upon job startup, the program deadlocks with the following output. Upon quick investigation, the opal_proc name is valid, but does not match the number that came from the socket (same jobid, but different (valid) rank).
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
3a4a1f93 (HEAD -> master, origin/master, origin/HEAD) Merge pull request #6239 from hppritcha/topic/swat_orte_shutdown.... Pritchard 2 hours agoPlease describe the system on which you are running
- Operating system/version: CentOS7
- Computer hardware: x86_64
- Network type: TCP
Details of the problem
salloc -N4 -Ccauchy /home/bouteill/ompi/master.debug/bin/mpirun -mca btl tcp,self IMB-MPI1 pingpong
salloc: Granted job allocation 245932
salloc: Waiting for resource configuration
salloc: Nodes c[00-03] are ready for job
#------------------------------------------------------------
# Intel(R) MPI Benchmarks 2019 Update 1, MPI-1 part
#------------------------------------------------------------
# Date : Fri Jan 4 18:52:06 2019
# Machine : x86_64
# System : Linux
# Release : 3.10.0-514.26.1.el7.x86_64
# Version : #1 SMP Wed Jun 28 15:10:01 CDT 2017
# MPI Version : 3.1
# MPI Thread Environment:
[...]
# PingPong
[c01][[19494,1],8][../../../../../master/opal/mca/btl/tcp/btl_tcp_endpoint.c:630:mca_btl_tcp_endpoint_recv_connect_ack] received unexpected process identifier [[19494,1],13]The same run with options -mca btl openib,vader,self completes successfully.