Skip to content

Commit 8ac0e66

Browse files
Zhu Yanjunjgunthorpe
authored andcommitted
RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
When run stress tests with RXE, the following Call Traces often occur watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ... The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug". Fixes: 8700e3e ("Soft RoCE driver") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Zhu Yanjun <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
1 parent 9b6d3bb commit 8ac0e66

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

drivers/infiniband/sw/rxe/rxe_comp.c

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ static inline enum comp_state check_ack(struct rxe_qp *qp,
329329
qp->comp.psn = pkt->psn;
330330
if (qp->req.wait_psn) {
331331
qp->req.wait_psn = 0;
332-
rxe_run_task(&qp->req.task, 1);
332+
rxe_run_task(&qp->req.task, 0);
333333
}
334334
}
335335
return COMPST_ERROR_RETRY;
@@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *qp, struct rxe_send_wqe *wqe)
463463
*/
464464
if (qp->req.wait_fence) {
465465
qp->req.wait_fence = 0;
466-
rxe_run_task(&qp->req.task, 1);
466+
rxe_run_task(&qp->req.task, 0);
467467
}
468468
}
469469

@@ -479,7 +479,7 @@ static inline enum comp_state complete_ack(struct rxe_qp *qp,
479479
if (qp->req.need_rd_atomic) {
480480
qp->comp.timeout_retry = 0;
481481
qp->req.need_rd_atomic = 0;
482-
rxe_run_task(&qp->req.task, 1);
482+
rxe_run_task(&qp->req.task, 0);
483483
}
484484
}
485485

@@ -725,7 +725,7 @@ int rxe_completer(void *arg)
725725
RXE_CNT_COMP_RETRY);
726726
qp->req.need_retry = 1;
727727
qp->comp.started_retry = 1;
728-
rxe_run_task(&qp->req.task, 1);
728+
rxe_run_task(&qp->req.task, 0);
729729
}
730730

731731
if (pkt) {

0 commit comments

Comments
 (0)