Skip to content

Commit a17a32f

Browse files
Ming Leikawasaki
authored andcommitted
nvme: optimize passthrough IOPOLL completion for local ring context
When multiple io_uring rings poll on the same NVMe queue, one ring can find completions belonging to another ring. The current code always uses task_work to handle this, but this adds overhead for the common single-ring case. This patch passes the polling io_ring_ctx through the iopoll callback chain via io_comp_batch and stores it in the request. In the NVMe end_io handler, we compare the polling context with the request's owning context. If they match (local), we complete inline. If they differ (remote) or it's a non-IOPOLL path, we use task_work as before. Changes: - Add poll_ctx field to struct io_comp_batch - Add poll_ctx to struct request's hash/ipi_list union - Set iob.poll_ctx in io_do_iopoll() before calling iopoll callbacks - Store poll_ctx in request in nvme_ns_chr_uring_cmd_iopoll() - Check local vs remote context in nvme_uring_cmd_end_io() ~10% IOPS improvement is observed in the following benchmark: fio/t/io_uring -b512 -d128 -c32 -s32 -p1 -F1 -B[0|1] -O0 -P1 -u1 -n1 /dev/ng0n1 Signed-off-by: Ming Lei <[email protected]>
1 parent 8a473ad commit a17a32f

File tree

4 files changed

+39
-9
lines changed

4 files changed

+39
-9
lines changed

drivers/nvme/host/ioctl.c

Lines changed: 28 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -425,14 +425,28 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
425425
pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
426426

427427
/*
428-
* IOPOLL could potentially complete this request directly, but
429-
* if multiple rings are polling on the same queue, then it's possible
430-
* for one ring to find completions for another ring. Punting the
431-
* completion via task_work will always direct it to the right
432-
* location, rather than potentially complete requests for ringA
433-
* under iopoll invocations from ringB.
428+
* For IOPOLL, check if this completion is happening in the context
429+
* of the same io_ring that owns the request (local context). If so,
430+
* we can complete inline without task_work overhead. Otherwise, we
431+
* must punt to task_work to ensure completion happens in the correct
432+
* ring's context.
434433
*/
435-
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
434+
if (blk_rq_is_poll(req) && req->poll_ctx == io_uring_cmd_ctx_handle(ioucmd)) {
435+
/*
436+
* Local context: the polling ring owns this request.
437+
* Complete inline for optimal performance.
438+
*/
439+
if (pdu->bio)
440+
blk_rq_unmap_user(pdu->bio);
441+
io_uring_cmd_done32(ioucmd, pdu->status, pdu->result, 0);
442+
} else {
443+
/*
444+
* Remote or non-IOPOLL context: either a different ring found
445+
* this completion, or this is IRQ/softirq completion. Use
446+
* task_work to direct completion to the correct location.
447+
*/
448+
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
449+
}
436450
return RQ_END_IO_FREE;
437451
}
438452

@@ -677,8 +691,14 @@ int nvme_ns_chr_uring_cmd_iopoll(struct io_uring_cmd *ioucmd,
677691
struct nvme_uring_cmd_pdu *pdu = nvme_uring_cmd_pdu(ioucmd);
678692
struct request *req = pdu->req;
679693

680-
if (req && blk_rq_is_poll(req))
694+
if (req && blk_rq_is_poll(req)) {
695+
/*
696+
* Store the polling context in the request so end_io can
697+
* detect if it's completing in the local ring's context.
698+
*/
699+
req->poll_ctx = iob ? iob->poll_ctx : NULL;
681700
return blk_rq_poll(req, iob, poll_flags);
701+
}
682702
return 0;
683703
}
684704
#ifdef CONFIG_NVME_MULTIPATH

include/linux/blk-mq.h

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,11 +175,13 @@ struct request {
175175
* request reaches the dispatch list. The ipi_list is only used
176176
* to queue the request for softirq completion, which is long
177177
* after the request has been unhashed (and even removed from
178-
* the dispatch list).
178+
* the dispatch list). poll_ctx is used during iopoll to track
179+
* the io_ring_ctx that initiated the poll operation.
179180
*/
180181
union {
181182
struct hlist_node hash; /* merge hash */
182183
struct llist_node ipi_list;
184+
void *poll_ctx; /* iopoll context */
183185
};
184186

185187
/*

include/linux/blkdev.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1820,6 +1820,7 @@ void bdev_fput(struct file *bdev_file);
18201820

18211821
struct io_comp_batch {
18221822
struct rq_list req_list;
1823+
void *poll_ctx;
18231824
bool need_ts;
18241825
void (*complete)(struct io_comp_batch *);
18251826
};

io_uring/rw.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1320,6 +1320,13 @@ int io_do_iopoll(struct io_ring_ctx *ctx, bool force_nonspin)
13201320
DEFINE_IO_COMP_BATCH(iob);
13211321
int nr_events = 0;
13221322

1323+
/*
1324+
* Store the polling ctx so drivers can detect if they're completing
1325+
* a request from the same ring that's polling (local) vs a different
1326+
* ring (remote). This enables optimizations for local completions.
1327+
*/
1328+
iob.poll_ctx = ctx;
1329+
13231330
/*
13241331
* Only spin for completions if we don't have multiple devices hanging
13251332
* off our complete list.

0 commit comments

Comments
 (0)