Skip to content

Freeze or error using function MPI_Comm_connect #13076

@mixen56

Description

@mixen56

Background information

What version of Open MPI are you using?

mpirun (Open MPI) 5.0.6

Describe how Open MPI was installed

Source release tarball.

./configure --prefix=/opt/openmpi-5.0.6 --with-pmix=internal

Please describe the system on which you are running

  • Operating system/version: debian 12.5
  • Computer hardware: x86_64 Intel(R) Core(TM) i3-13100

Details of the problem

I have a test which works fine with openmpi-4.x.x. But this application does not work with openmpi-5.0.6 (latest version current time).
Test checks for MPI_Comm_spawn, MPI_Comm_connect, MPI_Comm_disconnect functions of MPI. Test had been copied from mpich repo: https://raw.githubusercontent.com/pmodels/mpich/refs/heads/main/test/mpi/spawn/disconnect_reconnect.c. For compiling it's necessary to have this directory from source: test/mpi.

Compile:

/opt/openmpi-5.0.6/bin/mpicc src/spawn/disconnect_reconnect.c -o disconnect_reconnect -I src/include -I /opt/openmpi-5.0.6/include -L /opt/openmpi-5.0.6/lib src/util/mtest.c

Run:

MPITEST_VERBOSE=1 /opt/openmpi-5.0.6/bin/mpirun --allow-run-as-root -np 1 ./disconnect_reconnect    # with verbose
/opt/openmpi-5.0.6/bin/mpirun --allow-run-as-root -np 1 ./disconnect_reconnect                      # no verbose

Output:

  1. Freeze
  2. Or error
[0] accepting connection
[0] connecting to port (loop 1)
[1] connecting to port (loop 1)
[2] connecting to port (loop 1)
[mongoose:00000] *** An error occurred in MPI_Comm_accept
[mongoose:00000] *** reported by process [1767047169,0]
[mongoose:00000] *** on communicator MPI_COMM_WORLD
[mongoose:00000] *** MPI_ERR_UNKNOWN: unknown error
[mongoose:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mongoose:00000] ***    and MPI will try to terminate your MPI job as well)
[mongoose:00000] *** An error occurred in Socket closed
[mongoose:00000] *** reported by process [1767047170,2]
[mongoose:00000] *** on a NULL communicator
[mongoose:00000] *** Unknown error
[mongoose:00000] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[mongoose:00000] ***    and MPI will try to terminate your MPI job as well)
--------------------------------------------------------------------------
prterun has exited due to process rank 1 with PID 0 on node mongoose calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

Questions

  1. Is it normal, that test fails with openmpi-5? Isn't this a violation of the MPI standard? Maybe it's better to go back to the ompi-server?
  2. What is the best solution to make this test able to work? I found Error using MPI_Comm_connect/MPI_Comm_accept #6916, Comm_connect/accept fails openpmix/prrte#398, but this solution goes beyond the scope MPI. Also this needs test revision (ptre run as additional execution).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions