Skip to content

MPI_Cart_sub segfault #13081

@mwiesenberger

Description

@mwiesenberger

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

mpirun --version reports 4.0.0

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

I run Linux mint 21.3 v6.0.4 and use the operating system apt install libopenmpi-dev

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

Please describe the system on which you are running

  • Operating system/version: Linux mint 21.3 v6.0.4
  • Computer hardware: Intel Xeon W-2133 CPU
  • Network type: n-a
    mpic++ --version is g++ 11.4.0

Details of the problem

This program segfaults

 //segfault.cpp          
                                                                                                                                                       
#include <iostream>
#include <cassert>
#include <mpi.h>                                                           
                 
int main( int argc, char* argv[] )
{                                 
    MPI_Init(&argc, &argv);       
                                  
    int rank, size;        
    MPI_Comm_rank( MPI_COMM_WORLD, &rank);                                
    MPI_Comm_size( MPI_COMM_WORLD, &size);
                                          
    MPI_Comm comm =MPI_COMM_WORLD;        
    int reduce_rank = rank % 2;                                                 
    int color = reduce_rank == 0 ? 1 : MPI_UNDEFINED;                              
    MPI_Comm comm_split;                                                                                           
    MPI_Comm_split( comm, color, 0, &comm_split);                                                                  
                                                                                                                   
    MPI_Comm comm2;                                                                                                
    int dims[2] = {0,0};                                                                                           
    int periods[2] = {1,1};                                                                                        
    assert( MPI_Dims_create( size, 2, dims) == MPI_SUCCESS);                                                       
    assert( MPI_Cart_create( MPI_COMM_WORLD, 2, dims, periods, true , &comm2) == MPI_SUCCESS);                     
    int remains[2] = {1,0};                                                                                        
    MPI_Comm comm2_01;                                                                                             
    assert( comm2 != MPI_COMM_NULL);                                                                               
    assert( MPI_Cart_sub( comm2, remains, &comm2_01) == MPI_SUCCESS);        // Segfault here                                      
                                                                                                                   
    MPI_Finalize();                                                                                                
}  
shell$ mpic++ segfault.cpp -o segfault -g
shell$ mpirun -n 2 ./segfault
[titanxp:09308] *** Process received signal ***
[titanxp:09308] Signal: Segmentation fault (11)
[titanxp:09308] Signal code: Address not mapped (1)
[titanxp:09308] Failing at address: 0x8
[titanxp:09308] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fa3ee5ed520]
[titanxp:09308] [ 1] /usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_match+0x995)[0x7fa3cac121d5]
[titanxp:09308] [ 2] /usr/local/lib/openmpi/mca_btl_smcuda.so(mca_btl_smcuda_component_progress+0x324)[0x7fa3d9406af4]
[titanxp:09308] [ 3] /usr/local/lib/libopen-pal.so.40(opal_progress+0x2c)[0x7fa3ede36c2c]
[titanxp:09308] [ 4] /usr/local/lib/libmpi.so.40(ompi_request_default_wait+0x4d)[0x7fa3eea4b95d]
[titanxp:09308] [ 5] /usr/local/lib/libmpi.so.40(ompi_coll_base_sendrecv_actual+0xc1)[0x7fa3eeaa85d1]
[titanxp:09308] [ 6] /usr/local/lib/libmpi.so.40(ompi_coll_base_allgather_intra_two_procs+0x89)[0x7fa3eeaa77c9]
[titanxp:09308] [ 7] /usr/local/lib/libmpi.so.40(ompi_comm_split+0xc5)[0x7fa3eea2eba5]
[titanxp:09308] [ 8] /usr/local/lib/libmpi.so.40(mca_topo_base_cart_sub+0xe4)[0x7fa3eead0054]
[titanxp:09308] [ 9] /usr/local/lib/libmpi.so.40(PMPI_Cart_sub+0xca)[0x7fa3eea6805a]
[titanxp:09308] [10] ./segfault(+0x1469)[0x560e774d9469]
[titanxp:09308] [11] /lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fa3ee5d4d90]
[titanxp:09308] [12] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fa3ee5d4e40]
[titanxp:09308] [13] ./segfault(+0x11e5)[0x560e774d91e5]
[titanxp:09308] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node titanxp exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions