- 
                Notifications
    
You must be signed in to change notification settings  - Fork 929
 
Open
Description
Background information
What version of Open MPI are you using?
mpirun (Open MPI) 5.0.6
Describe how Open MPI was installed
Source release tarball.
./configure --prefix=/opt/openmpi-5.0.6 --with-pmix=internalPlease describe the system on which you are running
- Operating system/version: debian 12.5
 - Computer hardware: x86_64 Intel(R) Core(TM) i3-13100
 
Details of the problem
I have a test which works fine with openmpi-4.x.x. But this application does not work with openmpi-5.0.6 (latest version current time).
Test checks for MPI_Comm_spawn, MPI_Comm_connect, MPI_Comm_disconnect functions of MPI. Test had been copied from mpich repo: https://raw.githubusercontent.com/pmodels/mpich/refs/heads/main/test/mpi/spawn/disconnect_reconnect.c. For compiling it's necessary to have this directory from source: test/mpi.
Compile:
/opt/openmpi-5.0.6/bin/mpicc src/spawn/disconnect_reconnect.c -o disconnect_reconnect -I src/include -I /opt/openmpi-5.0.6/include -L /opt/openmpi-5.0.6/lib src/util/mtest.cRun:
MPITEST_VERBOSE=1 /opt/openmpi-5.0.6/bin/mpirun --allow-run-as-root -np 1 ./disconnect_reconnect    # with verbose
/opt/openmpi-5.0.6/bin/mpirun --allow-run-as-root -np 1 ./disconnect_reconnect                      # no verboseOutput:
- Freeze
 - Or error
 
[0] accepting connection
[0] connecting to port (loop 1)
[1] connecting to port (loop 1)
[2] connecting to port (loop 1)
[mongoose:00000] *** An error occurred in MPI_Comm_accept
[mongoose:00000] *** reported by process [1767047169,0]
[mongoose:00000] *** on communicator MPI_COMM_WORLD
[mongoose:00000] *** MPI_ERR_UNKNOWN: unknown error
[mongoose:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mongoose:00000] ***    and MPI will try to terminate your MPI job as well)
[mongoose:00000] *** An error occurred in Socket closed
[mongoose:00000] *** reported by process [1767047170,2]
[mongoose:00000] *** on a NULL communicator
[mongoose:00000] *** Unknown error
[mongoose:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mongoose:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 1 with PID 0 on node mongoose calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------
Questions
- Is it normal, that test fails with openmpi-5? Isn't this a violation of the MPI standard? Maybe it's better to go back to the ompi-server?
 - What is the best solution to make this test able to work? I found Error using MPI_Comm_connect/MPI_Comm_accept #6916, Comm_connect/accept fails openpmix/prrte#398, but this solution goes beyond the scope MPI. Also this needs test revision (ptre run as additional execution).
 
Metadata
Metadata
Assignees
Labels
No labels