Skip to content
This repository was archived by the owner on Sep 30, 2022. It is now read-only.

Commit 71e56be

Browse files
committed
btl/openib: XRC save SRQ#s on the loopback endpoint
This commit fixes a bug that can occur when communicating via XRC to peers on the same node. UDCM was not saving the SRQ numbers on the loopback endpoint (which shares its ib_addr info with all local peers) so any messages to local peers use an invalid SRQ number. Fixes open-mpi/ompi#1383 Signed-off-by: Nathan Hjelm <[email protected]> (cherry picked from commit open-mpi/ompi@2031bb6) Signed-off-by: Nathan Hjelm <[email protected]>
1 parent 8109dca commit 71e56be

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

opal/mca/btl/openib/connect/btl_openib_connect_udcm.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -551,6 +551,18 @@ static int udcm_endpoint_init_self_xrc (struct mca_btl_base_endpoint_t *lcl_ep)
551551
break;
552552
}
553553

554+
for (int i = 0 ; i < mca_btl_openib_component.num_xrc_qps ; ++i) {
555+
uint32_t srq_num;
556+
#if OPAL_HAVE_CONNECTX_XRC_DOMAINS
557+
if (ibv_get_srq_num(lcl_ep->endpoint_btl->qps[i].u.srq_qp.srq, &srq_num)) {
558+
BTL_ERROR(("BTL openib UDCM internal error: can't get srq num"));
559+
}
560+
#else
561+
srq_num = lcl_ep->endpoint_btl->qps[i].u.srq_qp.srq->xrc_srq_num;
562+
#endif
563+
lcl_ep->rem_info.rem_srqs[i].rem_srq_num = srq_num;
564+
}
565+
554566
#if OPAL_HAVE_CONNECTX_XRC_DOMAINS
555567
recv_qpn = lcl_ep->xrc_recv_qp->qp_num;
556568
#else

0 commit comments

Comments
 (0)