-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Even after performing allocation via verbs->resizeTag(1000) to avoid #59, we have issues:
/storage/home/kdichev/LPF-gitlab2/build/lpfrun_build -engine zero -n 2 /storage/home/kdichev/LPF-gitlab2/build/src/MPI/zero_test --gtest_filter=ZeroTests.resizeMemreg
Running main() from /scratch/kdichev/.spack/stage/spack-stage-googletest-1.14.0-th5nac5n2cvmf3nluwlgarz242h2bug6/spack-src/googletest/src/gtest_main.cc
Note: Google Test filter = ZeroTests.resizeMemreg
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ZeroTests
Running main() from /scratch/kdichev/.spack/stage/spack-stage-googletest-1.14.0-th5nac5n2cvmf3nluwlgarz242h2bug6/spack-src/googletest/src/gtest_main.cc
Note: Google Test filter = ZeroTests.resizeMemreg
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from ZeroTests
[srv04:1240383:0:1240385] Caught signal 4 (Illegal instruction: illegal opcode)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
BFD: DWARF error: section .debug_info is larger than its filesize! (0x536af6 vs 0x3d6ae8)
==== backtrace (tid:1240385) ====
0 0x00000000000e1e4c munmap() ???:0
1 0x000000000008a39c timer_settime() ???:0
2 0x000000000008a868 timer_settime() ???:0
3 0x000000000008cfa0 __default_morecore() ???:0
4 0x000000000008d778 malloc() ???:0
5 0x0000000000029370 get_print_name_buffer() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/orte/util/name_fns.c:106
6 0x0000000000029370 get_print_name_buffer() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/orte/util/name_fns.c:88
7 0x00000000000293d4 orte_util_print_jobids() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/orte/util/name_fns.c:171
8 0x00000000000297c4 orte_util_print_name_args() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/orte/util/name_fns.c:143
9 0x0000000000098034 _process_name_print_for_opal() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/orte/runtime/orte_init.c:68
10 0x0000000000005870 process_event() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/opal/mca/pmix/pmix3x/pmix3x.c:256
11 0x00000000000803b8 event_process_active_single_queue() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/opal/mca/event/libevent2022/libevent/event.c:1370
12 0x00000000000803b8 event_process_active() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/opal/mca/event/libevent2022/libevent/event.c:1440
13 0x00000000000803b8 opal_libevent2022_event_base_loop() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/opal/mca/event/libevent2022/libevent/event.c:1644
14 0x000000000003c6cc progress_engine() /build-result/src/hpcx-v2.13.1-gcc-MLNX_OFED_LINUX-5-ubuntu22.04-cuda11-gdrcopy2-nccl2.12-aarch64/ompi-5abd86cc8c5d75c5fe7894b379515d97839c1416/opal/runtime/opal_progress_threads.c:105
15 0x000000000007d5b8 pthread_condattr_setpshared() ???:0
16 0x00000000000e5edc clone() ???:0
=================================
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 1240383 on node srv04 exited on signal 4 (Illegal instruction).