Skip to content

Commit 54a485e

Browse files
mmarcinijgunthorpe
authored andcommitted
IB/rdmavt: Fix RQ counting issues causing use of an invalid RWQE
The lookaside count is improperly initialized to the size of the Receive Queue with the additional +1. In the traces below, the RQ size is 384, so the count was set to 385. The lookaside count is then rarely refreshed. Note the high and incorrect count in the trace below: rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9008 wr_id 55c7206d75a0 qpn c qpt 2 pid 3018 num_sge 1 head 1 tail 0, count 385 rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1 The head,tail indicate there is only one RWQE posted although the count says 385 and we correctly return the element 0. The next call to rvt_get_rwqe with the decremented count: rvt_get_rwqe: [hfi1_0] wqe ffffc900078e9058 wr_id 0 qpn c qpt 2 pid 3018 num_sge 0 head 1 tail 1, count 384 rvt_get_rwqe: (hfi1_rc_rcv+0x4eb/0x1480 [hfi1] <- rvt_get_rwqe) ret=0x1 Note that the RQ is empty (head == tail) yet we return the RWQE at tail 1, which is not valid because of the bogus high count. Best case, the RWQE has never been posted and the rc logic sees an RWQE that is too small (all zeros) and puts the QP into an error state. In the worst case, a server slow at posting receive buffers might fool rvt_get_rwqe() into fetching an old RWQE and corrupt memory. Fix by deleting the faulty initialization code and creating an inline to fetch the posted count and convert all callers to use new inline. Fixes: f592ae3 ("IB/rdmavt: Fracture single lock used for posting and processing RWQEs") Link: https://lore.kernel.org/r/[email protected] Reported-by: Zhaojuan Guo <[email protected]> Cc: <[email protected]> # 5.4.x Reviewed-by: Kaike Wan <[email protected]> Signed-off-by: Mike Marciniszyn <[email protected]> Tested-by: Honggang Li <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
1 parent 81530ab commit 54a485e

File tree

3 files changed

+24
-32
lines changed

3 files changed

+24
-32
lines changed

drivers/infiniband/sw/rdmavt/qp.c

Lines changed: 4 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -901,8 +901,6 @@ static void rvt_init_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp,
901901
qp->s_tail_ack_queue = 0;
902902
qp->s_acked_ack_queue = 0;
903903
qp->s_num_rd_atomic = 0;
904-
if (qp->r_rq.kwq)
905-
qp->r_rq.kwq->count = qp->r_rq.size;
906904
qp->r_sge.num_sge = 0;
907905
atomic_set(&qp->s_reserved_used, 0);
908906
}
@@ -2366,31 +2364,6 @@ static int init_sge(struct rvt_qp *qp, struct rvt_rwqe *wqe)
23662364
return 0;
23672365
}
23682366

2369-
/**
2370-
* get_count - count numbers of request work queue entries
2371-
* in circular buffer
2372-
* @rq: data structure for request queue entry
2373-
* @tail: tail indices of the circular buffer
2374-
* @head: head indices of the circular buffer
2375-
*
2376-
* Return - total number of entries in the circular buffer
2377-
*/
2378-
static u32 get_count(struct rvt_rq *rq, u32 tail, u32 head)
2379-
{
2380-
u32 count;
2381-
2382-
count = head;
2383-
2384-
if (count >= rq->size)
2385-
count = 0;
2386-
if (count < tail)
2387-
count += rq->size - tail;
2388-
else
2389-
count -= tail;
2390-
2391-
return count;
2392-
}
2393-
23942367
/**
23952368
* get_rvt_head - get head indices of the circular buffer
23962369
* @rq: data structure for request queue entry
@@ -2465,7 +2438,7 @@ int rvt_get_rwqe(struct rvt_qp *qp, bool wr_id_only)
24652438

24662439
if (kwq->count < RVT_RWQ_COUNT_THRESHOLD) {
24672440
head = get_rvt_head(rq, ip);
2468-
kwq->count = get_count(rq, tail, head);
2441+
kwq->count = rvt_get_rq_count(rq, head, tail);
24692442
}
24702443
if (unlikely(kwq->count == 0)) {
24712444
ret = 0;
@@ -2500,7 +2473,9 @@ int rvt_get_rwqe(struct rvt_qp *qp, bool wr_id_only)
25002473
* the number of remaining WQEs.
25012474
*/
25022475
if (kwq->count < srq->limit) {
2503-
kwq->count = get_count(rq, tail, get_rvt_head(rq, ip));
2476+
kwq->count =
2477+
rvt_get_rq_count(rq,
2478+
get_rvt_head(rq, ip), tail);
25042479
if (kwq->count < srq->limit) {
25052480
struct ib_event ev;
25062481

drivers/infiniband/sw/rdmavt/rc.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -127,9 +127,7 @@ __be32 rvt_compute_aeth(struct rvt_qp *qp)
127127
* not atomic, which is OK, since the fuzziness is
128128
* resolved as further ACKs go out.
129129
*/
130-
credits = head - tail;
131-
if ((int)credits < 0)
132-
credits += qp->r_rq.size;
130+
credits = rvt_get_rq_count(&qp->r_rq, head, tail);
133131
}
134132
/*
135133
* Binary search the credit table to find the code to

include/rdma/rdmavt_qp.h

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -305,6 +305,25 @@ struct rvt_rq {
305305
spinlock_t lock ____cacheline_aligned_in_smp;
306306
};
307307

308+
/**
309+
* rvt_get_rq_count - count numbers of request work queue entries
310+
* in circular buffer
311+
* @rq: data structure for request queue entry
312+
* @head: head indices of the circular buffer
313+
* @tail: tail indices of the circular buffer
314+
*
315+
* Return - total number of entries in the Receive Queue
316+
*/
317+
318+
static inline u32 rvt_get_rq_count(struct rvt_rq *rq, u32 head, u32 tail)
319+
{
320+
u32 count = head - tail;
321+
322+
if ((s32)count < 0)
323+
count += rq->size;
324+
return count;
325+
}
326+
308327
/*
309328
* This structure holds the information that the send tasklet needs
310329
* to send a RDMA read response or atomic operation.

0 commit comments

Comments
 (0)