-
Couldn't load subscription status.
- Fork 929
Closed
Description
With the master using --mca osc rdma, I see a variety of onesided tests that are hanging. The list of tests are: c_accumulate, c_accumulate_atomic, c_create_dynamic, c_fence_lock, c_flush, c_get_accumulate, c_post_start, c_reqops, c_strided_acc_indexed, etc....
Am I the only one that sees this?
[rvandevaart@drossetti-ivy4 onesided]$ mpirun --mca osc pt2pt --mca btl self,sm,openib --mca btl_openib_warn_default_gid_prefix 0 -np 2 -host drossetti-ivy4,drossetti-ivy5 c_accumulate
[rvandevaart@drossetti-ivy4 onesided]$ mpirun --mca osc rdma --mca btl self,sm,openib --mca btl_openib_warn_default_gid_prefix 0 -np 2 -host drossetti-ivy4,drossetti-ivy5 c_accumulate
....hang....
[rvandevaart@drossetti-ivy4 src]$ pstack 29329
Thread 3 (Thread 0x7fefc03fb700 (LWP 29330)):
#0 0x00007fefc28b3173 in epoll_wait () from /lib64/libc.so.6
#1 0x00007fefc2236506 in epoll_dispatch () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#2 0x00007fefc223bcc6 in opal_libevent2022_event_base_loop () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#3 0x00007fefc041d44d in progress_engine () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libpmix.so.0
#4 0x00007fefc2b659d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fefc28b2b7d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7fefbf3fa700 (LWP 29331)):
#0 0x00007fefc28a9353 in poll () from /lib64/libc.so.6
#1 0x00007fefc2241d1a in poll_dispatch () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#2 0x00007fefc223bcc6 in opal_libevent2022_event_base_loop () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#3 0x00007fefc21e0b40 in progress_engine () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#4 0x00007fefc2b659d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007fefc28b2b7d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7fefc3308700 (LWP 29329)):
#0 0x00007fefbdbeacc2 in opal_list_remove_item () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#1 0x00007fefbdbec841 in mca_rcache_vma_tree_delete () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#2 0x00007fefbdbea7e8 in mca_rcache_vma_delete () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#3 0x00007fefbd7e28d6 in mca_mpool_grdma_register () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_mpool_grdma.so
#4 0x00007fefbcd87f6a in mca_btl_openib_register_mem () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_btl_openib.so
#5 0x00007fefbb494039 in _ompi_osc_rdma_register () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#6 0x00007fefbb494623 in ompi_osc_rdma_frag_alloc () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#7 0x00007fefbb49479c in ompi_osc_rdma_lock_try_acquire_exclusive () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#8 0x00007fefbb494954 in ompi_osc_rdma_lock_acquire_exclusive () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#9 0x00007fefbb494ef0 in ompi_osc_rdma_gacc_local () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#10 0x00007fefbb49716b in ompi_osc_rdma_rget_accumulate_internal () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#11 0x00007fefbb497b3c in ompi_osc_rdma_accumulate () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#12 0x00007fefc2e30062 in PMPI_Accumulate () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libmpi.so.0
#13 0x00000000004010ba in main ()
[rvandevaart@drossetti-ivy4 src]$
[rvandevaart@drossetti-ivy5 dbg-nocuda]$ pstack 28334
Thread 3 (Thread 0x7f6420992700 (LWP 28335)):
#0 0x00007f6422e49173 in epoll_wait () from /lib64/libc.so.6
#1 0x00007f64227cc506 in epoll_dispatch () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#2 0x00007f64227d1cc6 in opal_libevent2022_event_base_loop () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#3 0x00007f64209b444d in progress_engine () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libpmix.so.0
#4 0x00007f64230fb9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6422e48b7d in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f641f991700 (LWP 28336)):
#0 0x00007f6422e3f353 in poll () from /lib64/libc.so.6
#1 0x00007f64227d7d1a in poll_dispatch () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#2 0x00007f64227d1cc6 in opal_libevent2022_event_base_loop () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#3 0x00007f6422776b40 in progress_engine () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libopen-pal.so.0
#4 0x00007f64230fb9d1 in start_thread () from /lib64/libpthread.so.0
#5 0x00007f6422e48b7d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f642389f700 (LWP 28334)):
#0 0x00007f641e181cc2 in opal_list_remove_item () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#1 0x00007f641e183841 in mca_rcache_vma_tree_delete () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#2 0x00007f641e1817e8 in mca_rcache_vma_delete () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_rcache_vma.so
#3 0x00007f641dd798d6 in mca_mpool_grdma_register () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_mpool_grdma.so
#4 0x00007f641d31ef6a in mca_btl_openib_register_mem () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_btl_openib.so
#5 0x00007f641ba2d039 in _ompi_osc_rdma_register () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#6 0x00007f641ba2d623 in ompi_osc_rdma_frag_alloc () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#7 0x00007f641ba2d79c in ompi_osc_rdma_lock_try_acquire_exclusive () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#8 0x00007f641ba2d954 in ompi_osc_rdma_lock_acquire_exclusive () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#9 0x00007f641ba2e70c in ompi_osc_rdma_gacc_contig () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#10 0x00007f641ba2ed71 in ompi_osc_rdma_gacc_master () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#11 0x00007f641ba301d9 in ompi_osc_rdma_rget_accumulate_internal () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#12 0x00007f641ba30b3c in ompi_osc_rdma_accumulate () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/openmpi/mca_osc_rdma.so
#13 0x00007f64233c6062 in PMPI_Accumulate () from /ivylogin/home/rvandevaart/ompi-repos/ompi-master-rolfv/dbg-nocuda/lib/libmpi.so.0
#14 0x00000000004010ba in main ()
[rvandevaart@drossetti-ivy5 dbg-nocuda]$