Skip to content

Commit edc4ef0

Browse files
zhuyjrleon
authored andcommitted
RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"
The Call Trace is as below: " <TASK> ? show_regs.cold+0x1a/0x1f ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __warn+0x84/0xd0 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? report_bug+0x105/0x180 ? handle_bug+0x46/0x80 ? exc_invalid_op+0x19/0x70 ? asm_exc_invalid_op+0x1b/0x20 ? __rxe_cleanup+0x12c/0x170 [rdma_rxe] ? __rxe_cleanup+0x124/0x170 [rdma_rxe] rxe_destroy_qp.cold+0x24/0x29 [rdma_rxe] ib_destroy_qp_user+0x118/0x190 [ib_core] rdma_destroy_qp.cold+0x43/0x5e [rdma_cm] rtrs_cq_qp_destroy.cold+0x1d/0x2b [rtrs_core] rtrs_srv_close_work.cold+0x1b/0x31 [rtrs_server] process_one_work+0x21d/0x3f0 worker_thread+0x4a/0x3c0 ? process_one_work+0x3f0/0x3f0 kthread+0xf0/0x120 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> " When too many rdma resources are allocated, rxe needs more time to handle these rdma resources. Sometimes with the current timeout, rxe can not release the rdma resources correctly. Compared with other rdma drivers, a bigger timeout is used. Fixes: 215d0a7 ("RDMA/rxe: Stop lookup of partially built objects") Signed-off-by: Zhu Yanjun <[email protected]> Link: https://patch.msgid.link/[email protected] Tested-by: Joe Klein <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
1 parent 42e6ddd commit edc4ef0

File tree

1 file changed

+5
-6
lines changed

1 file changed

+5
-6
lines changed

drivers/infiniband/sw/rxe/rxe_pool.c

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,6 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
178178
{
179179
struct rxe_pool *pool = elem->pool;
180180
struct xarray *xa = &pool->xa;
181-
static int timeout = RXE_POOL_TIMEOUT;
182181
int ret, err = 0;
183182
void *xa_ret;
184183

@@ -202,19 +201,19 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
202201
* return to rdma-core
203202
*/
204203
if (sleepable) {
205-
if (!completion_done(&elem->complete) && timeout) {
204+
if (!completion_done(&elem->complete)) {
206205
ret = wait_for_completion_timeout(&elem->complete,
207-
timeout);
206+
msecs_to_jiffies(50000));
208207

209208
/* Shouldn't happen. There are still references to
210209
* the object but, rather than deadlock, free the
211210
* object or pass back to rdma-core.
212211
*/
213212
if (WARN_ON(!ret))
214-
err = -EINVAL;
213+
err = -ETIMEDOUT;
215214
}
216215
} else {
217-
unsigned long until = jiffies + timeout;
216+
unsigned long until = jiffies + RXE_POOL_TIMEOUT;
218217

219218
/* AH objects are unique in that the destroy_ah verb
220219
* can be called in atomic context. This delay
@@ -226,7 +225,7 @@ int __rxe_cleanup(struct rxe_pool_elem *elem, bool sleepable)
226225
mdelay(1);
227226

228227
if (WARN_ON(!completion_done(&elem->complete)))
229-
err = -EINVAL;
228+
err = -ETIMEDOUT;
230229
}
231230

232231
if (pool->cleanup)

0 commit comments

Comments
 (0)