Skip to content

TCP btl segfaulting with ipv6 #5350

@PeterGottesman

Description

@PeterGottesman

Following #5328, all v2.x tests are segfaulting on MTT when configured with --enable-ipv6. I confirmed that this always happens with ipv6 enabled, and was not present before that PR was merged.

I am unsure of the cause, but in gdb I see that at line 365 index is 4294951708. I suspect is incorrect as the size of the array is 8(MAX_KERNEL_INTERFACES) as defined on line 283, confirmed through gdb.

Configure args:

--enable-picky --enable-debug --enable-mpirun-prefix-by-default --enable-mpi-cxx
--disable-dlopen --enable-ipv6
$ mpirun --hetero-nodes -np 2 --mca orte_startup_timeout 10000 --mca oob tcp --mca btl tcp,self ./c_ring 

Sample output: https://mtt.open-mpi.org/index.php?do_redir=2643

(gdb) bt
#0  0x00002aaaabc44d9f in mca_btl_tcp_retrieve_local_interfaces (proc_data=0x7fffffffc280) at btl_tcp_proc.c:365
#1  0x00002aaaabc4514d in mca_btl_tcp_proc_insert (btl_proc=0x751450, btl_endpoint=0x755a70) at btl_tcp_proc.c:453
#2  0x00002aaaabc3b02a in mca_btl_tcp_add_procs (btl=0x726050, nprocs=2, procs=0x74aa70, peers=0x74aab0, reachable=0x7fffffffc630) at btl_tcp.c:118
#3  0x00002aaaaadb2002 in mca_bml_r2_add_procs (nprocs=2, procs=0x74aa70, reachable=0x7fffffffc630) at bml_r2.c:521
#4  0x00002aaaaaec6489 in mca_pml_ob1_add_procs (procs=0x74ccf0, nprocs=2) at pml_ob1.c:302
#5  0x00002aaaaad42e80 in ompi_mpi_init (argc=1, argv=0x7fffffffc888, requested=0, provided=0x7fffffffc73c, reinit_ok=false) at runtime/ompi_mpi_init.c:777
#6  0x00002aaaaad76c9a in PMPI_Init (argc=0x7fffffffc76c, argv=0x7fffffffc760) at pinit.c:66
#7  0x00000000004009e3 in main ()

This issue is present in #5330 and #5331, which have not been merged yet, as well.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions