Skip to content

Commit 4836da2

Browse files
da-xTrond Myklebust
authored andcommitted
rpcrdma: fix handling for RDMA_CM_EVENT_DEVICE_REMOVAL
Under the scenario of IB device bonding, when bringing down one of the ports, or all ports, we saw xprtrdma entering a non-recoverable state where it is not even possible to complete the disconnect and shut it down the mount, requiring a reboot. Following debug, we saw that transport connect never ended after receiving the RDMA_CM_EVENT_DEVICE_REMOVAL callback. The DEVICE_REMOVAL callback is irrespective of whether the CM_ID is connected, and ESTABLISHED may not have happened. So need to work with each of these states accordingly. Fixes: 2acc5ca ('xprtrdma: Prevent dereferencing r_xprt->rx_ep after it is freed') Cc: Sagi Grimberg <[email protected]> Signed-off-by: Dan Aloni <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chuck Lever <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
1 parent d1404e4 commit 4836da2

File tree

1 file changed

+5
-1
lines changed

1 file changed

+5
-1
lines changed

net/sunrpc/xprtrdma/verbs.c

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,11 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event)
244244
case RDMA_CM_EVENT_DEVICE_REMOVAL:
245245
pr_info("rpcrdma: removing device %s for %pISpc\n",
246246
ep->re_id->device->name, sap);
247-
fallthrough;
247+
switch (xchg(&ep->re_connect_status, -ENODEV)) {
248+
case 0: goto wake_connect_worker;
249+
case 1: goto disconnected;
250+
}
251+
return 0;
248252
case RDMA_CM_EVENT_ADDR_CHANGE:
249253
ep->re_connect_status = -ENODEV;
250254
goto disconnected;

0 commit comments

Comments
 (0)