Skip to content

Commit eb9f854

Browse files
Ming Leikawasaki
authored andcommitted
nvme/io_uring: optimize IOPOLL completions for local ring context
When multiple io_uring rings poll on the same NVMe queue, one ring can find completions belonging to another ring. The current code always uses task_work to handle this, but this adds overhead for the common single-ring case. This patch passes the polling io_ring_ctx through io_comp_batch's new poll_ctx field. In io_do_iopoll(), the polling ring's context is stored in iob.poll_ctx before calling the iopoll callbacks. In nvme_uring_cmd_end_io(), we now compare iob->poll_ctx with the request's owning io_ring_ctx (via io_uring_cmd_ctx_handle()). If they match (local context), we complete inline with io_uring_cmd_done32(). If they differ (remote context) or iob is NULL (non-iopoll path), we use task_work as before. This optimization eliminates task_work scheduling overhead for the common case where a ring polls and finds its own completions. ~10% IOPS improvement is observed in the following benchmark: fio/t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -O0 -P1 -u1 -n1 /dev/ng0n1 Signed-off-by: Ming Lei <[email protected]>
1 parent 7dd1f96 commit eb9f854

File tree

3 files changed

+20
-7
lines changed

3 files changed

+20
-7
lines changed

drivers/nvme/host/ioctl.c

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -426,14 +426,20 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
426426
pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
427427

428428
/*
429-
* IOPOLL could potentially complete this request directly, but
430-
* if multiple rings are polling on the same queue, then it's possible
431-
* for one ring to find completions for another ring. Punting the
432-
* completion via task_work will always direct it to the right
433-
* location, rather than potentially complete requests for ringA
434-
* under iopoll invocations from ringB.
429+
* For IOPOLL, check if this completion is happening in the context
430+
* of the same io_ring that owns the request (local context). If so,
431+
* we can complete inline without task_work overhead. Otherwise, we
432+
* must punt to task_work to ensure completion happens in the correct
433+
* ring's context.
435434
*/
436-
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
435+
if (blk_rq_is_poll(req) && iob &&
436+
iob->poll_ctx == io_uring_cmd_ctx_handle(ioucmd)) {
437+
if (pdu->bio)
438+
blk_rq_unmap_user(pdu->bio);
439+
io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
440+
} else {
441+
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
442+
}
437443
return RQ_END_IO_FREE;
438444
}
439445

include/linux/blkdev.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1822,6 +1822,7 @@ struct io_comp_batch {
18221822
struct rq_list req_list;
18231823
bool need_ts;
18241824
void (*complete)(struct io_comp_batch *);
1825+
void *poll_ctx;
18251826
};
18261827

18271828
static inline bool blk_atomic_write_start_sect_aligned(sector_t sector,

io_uring/rw.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1320,6 +1320,12 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
13201320
DEFINE_IO_COMP_BATCH(iob);
13211321
int nr_events = 0;
13221322

1323+
/*
1324+
* Store the polling io_ring_ctx so drivers can detect if they're
1325+
* completing a request in the same ring context that's polling.
1326+
*/
1327+
iob.poll_ctx = ctx;
1328+
13231329
/*
13241330
* Only spin for completions if we don't have multiple devices hanging
13251331
* off our complete list.

0 commit comments

Comments
 (0)