Skip to content

Commit 9961a78

Browse files
committed
Merge tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux
Pull io_uring updates from Jens Axboe: - Greatly improve send zerocopy performance, by enabling coalescing of sent buffers. MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the io_uring side did not. In local testing, the crossover point for send zerocopy being faster is now around 3000 byte packets, and it performs better than the sync syscall variants as well. This feature relies on a shared branch with net-next, which was pulled into both branches. - Unification of how async preparation is done across opcodes. Previously, opcodes that required extra memory for async retry would allocate that as needed, using on-stack state until that was the case. If async retry was needed, the on-stack state was adjusted appropriately for a retry and then copied to the allocated memory. This led to some fragile and ugly code, particularly for read/write handling, and made storage retries more difficult than they needed to be. Allocate the memory upfront, as it's cheap from our pools, and use that state consistently both initially and also from the retry side. - Move away from using remap_pfn_range() for mapping the rings. This is really not the right interface to use and can cause lifetime issues or leaks. Additionally, it means the ring sq/cq arrays need to be physically contigious, which can cause problems in production with larger rings when services are restarted, as memory can be very fragmented at that point. Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply the same treatment to mapped ring provided buffers. This also helps unify the code we have dealing with allocating and mapping memory. Hard to see in the diffstat as we're adding a few features as well, but this kills about ~400 lines of code from the codebase as well. - Add support for bundles for send/recv. When used with provided buffers, bundles support sending or receiving more than one buffer at the time, improving the efficiency by only needing to call into the networking stack once for multiple sends or receives. - Tweaks for our accept operations, supporting both a DONTWAIT flag for skipping poll arm and retry if we can, and a POLLFIRST flag that the application can use to skip the initial accept attempt and rely purely on poll for triggering the operation. Both of these have identical flags on the receive side already. - Make the task_work ctx locking unconditional. We had various code paths here that would do a mix of lock/trylock and set the task_work state to whether or not it was locked. All of that goes away, we lock it unconditionally and get rid of the state flag indicating whether it's locked or not. The state struct still exists as an empty type, can go away in the future. - Add support for specifying NOP completion values, allowing it to be used for error handling testing. - Use set/test bit for io-wq worker flags. Not strictly needed, but also doesn't hurt and helps silence a KCSAN warning. - Cleanups for io-wq locking and work assignments, closing a tiny race where cancelations would not be able to find the work item reliably. - Misc fixes, cleanups, and improvements * tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits) io_uring: support to inject result for NOP io_uring: fail NOP if non-zero op flags is passed in io_uring/net: add IORING_ACCEPT_POLL_FIRST flag io_uring/net: add IORING_ACCEPT_DONTWAIT flag io_uring/filetable: don't unnecessarily clear/reset bitmap io_uring/io-wq: Use set_bit() and test_bit() at worker->flags io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring io_uring: Require zeroed sqe->len on provided-buffers send io_uring/notif: disable LAZY_WAKE for linked notifs io_uring/net: fix sendzc lazy wake polling io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it io_uring/rw: reinstate thread check for retries io_uring/notif: implement notification stacking io_uring/notif: simplify io_notif_flush() net: add callback for setting a ubuf_info to skb net: extend ubuf_info callback to ops structure io_uring/net: support bundles for recv io_uring/net: support bundles for send io_uring/kbuf: add helpers for getting/peeking multiple buffers io_uring/net: add provided buffer support for IORING_OP_SEND ...
2 parents f4e8d80 + deb1e49 commit 9961a78

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2050
-1762
lines changed

drivers/net/tap.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -754,7 +754,7 @@ static ssize_t tap_get_user(struct tap_queue *q, void *msg_control,
754754
skb_zcopy_init(skb, msg_control);
755755
} else if (msg_control) {
756756
struct ubuf_info *uarg = msg_control;
757-
uarg->callback(NULL, uarg, false);
757+
uarg->ops->complete(NULL, uarg, false);
758758
}
759759

760760
dev_queue_xmit(skb);

drivers/net/tun.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1906,7 +1906,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
19061906
skb_zcopy_init(skb, msg_control);
19071907
} else if (msg_control) {
19081908
struct ubuf_info *uarg = msg_control;
1909-
uarg->callback(NULL, uarg, false);
1909+
uarg->ops->complete(NULL, uarg, false);
19101910
}
19111911

19121912
skb_reset_network_header(skb);

drivers/net/xen-netback/common.h

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -390,9 +390,8 @@ bool xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
390390

391391
void xenvif_carrier_on(struct xenvif *vif);
392392

393-
/* Callback from stack when TX packet can be released */
394-
void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
395-
bool zerocopy_success);
393+
/* Callbacks from stack when TX packet can be released */
394+
extern const struct ubuf_info_ops xenvif_ubuf_ops;
396395

397396
static inline pending_ring_idx_t nr_pending_reqs(struct xenvif_queue *queue)
398397
{

drivers/net/xen-netback/interface.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -593,7 +593,7 @@ int xenvif_init_queue(struct xenvif_queue *queue)
593593

594594
for (i = 0; i < MAX_PENDING_REQS; i++) {
595595
queue->pending_tx_info[i].callback_struct = (struct ubuf_info_msgzc)
596-
{ { .callback = xenvif_zerocopy_callback },
596+
{ { .ops = &xenvif_ubuf_ops },
597597
{ { .ctx = NULL,
598598
.desc = i } } };
599599
queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;

drivers/net/xen-netback/netback.c

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1156,7 +1156,7 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
11561156
uarg = skb_shinfo(skb)->destructor_arg;
11571157
/* increase inflight counter to offset decrement in callback */
11581158
atomic_inc(&queue->inflight_packets);
1159-
uarg->callback(NULL, uarg, true);
1159+
uarg->ops->complete(NULL, uarg, true);
11601160
skb_shinfo(skb)->destructor_arg = NULL;
11611161

11621162
/* Fill the skb with the new (local) frags. */
@@ -1278,8 +1278,9 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
12781278
return work_done;
12791279
}
12801280

1281-
void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base,
1282-
bool zerocopy_success)
1281+
static void xenvif_zerocopy_callback(struct sk_buff *skb,
1282+
struct ubuf_info *ubuf_base,
1283+
bool zerocopy_success)
12831284
{
12841285
unsigned long flags;
12851286
pending_ring_idx_t index;
@@ -1312,6 +1313,10 @@ void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base,
13121313
xenvif_skb_zerocopy_complete(queue);
13131314
}
13141315

1316+
const struct ubuf_info_ops xenvif_ubuf_ops = {
1317+
.complete = xenvif_zerocopy_callback,
1318+
};
1319+
13151320
static inline void xenvif_tx_dealloc_action(struct xenvif_queue *queue)
13161321
{
13171322
struct gnttab_unmap_grant_ref *gop;

drivers/nvme/host/ioctl.c

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -423,13 +423,20 @@ static enum rq_end_io_ret nvme_uring_cmd_end_io(struct request *req,
423423
pdu->result = le64_to_cpu(nvme_req(req)->result.u64);
424424

425425
/*
426-
* For iopoll, complete it directly.
426+
* For iopoll, complete it directly. Note that using the uring_cmd
427+
* helper for this is safe only because we check blk_rq_is_poll().
428+
* As that returns false if we're NOT on a polled queue, then it's
429+
* safe to use the polled completion helper.
430+
*
427431
* Otherwise, move the completion to task work.
428432
*/
429-
if (blk_rq_is_poll(req))
430-
nvme_uring_task_cb(ioucmd, IO_URING_F_UNLOCKED);
431-
else
433+
if (blk_rq_is_poll(req)) {
434+
if (pdu->bio)
435+
blk_rq_unmap_user(pdu->bio);
436+
io_uring_cmd_iopoll_done(ioucmd, pdu->result, pdu->status);
437+
} else {
432438
io_uring_cmd_do_in_task_lazy(ioucmd, nvme_uring_task_cb);
439+
}
433440

434441
return RQ_END_IO_FREE;
435442
}

drivers/vhost/net.c

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -380,7 +380,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
380380
}
381381
}
382382

383-
static void vhost_zerocopy_callback(struct sk_buff *skb,
383+
static void vhost_zerocopy_complete(struct sk_buff *skb,
384384
struct ubuf_info *ubuf_base, bool success)
385385
{
386386
struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
@@ -408,6 +408,10 @@ static void vhost_zerocopy_callback(struct sk_buff *skb,
408408
rcu_read_unlock_bh();
409409
}
410410

411+
static const struct ubuf_info_ops vhost_ubuf_ops = {
412+
.complete = vhost_zerocopy_complete,
413+
};
414+
411415
static inline unsigned long busy_clock(void)
412416
{
413417
return local_clock() >> 10;
@@ -879,7 +883,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
879883
vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
880884
ubuf->ctx = nvq->ubufs;
881885
ubuf->desc = nvq->upend_idx;
882-
ubuf->ubuf.callback = vhost_zerocopy_callback;
886+
ubuf->ubuf.ops = &vhost_ubuf_ops;
883887
ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG;
884888
refcount_set(&ubuf->ubuf.refcnt, 1);
885889
msg.msg_control = &ctl;

include/linux/io_uring.h

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@ void __io_uring_cancel(bool cancel_all);
1111
void __io_uring_free(struct task_struct *tsk);
1212
void io_uring_unreg_ringfd(void);
1313
const char *io_uring_get_opcode(u8 opcode);
14-
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
1514
bool io_is_uring_fops(struct file *file);
1615

1716
static inline void io_uring_files_cancel(void)
@@ -45,11 +44,6 @@ static inline const char *io_uring_get_opcode(u8 opcode)
4544
{
4645
return "";
4746
}
48-
static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd,
49-
unsigned int issue_flags)
50-
{
51-
return -EOPNOTSUPP;
52-
}
5347
static inline bool io_is_uring_fops(struct file *file)
5448
{
5549
return false;

include/linux/io_uring/cmd.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,12 +26,25 @@ static inline const void *io_uring_sqe_cmd(const struct io_uring_sqe *sqe)
2626
#if defined(CONFIG_IO_URING)
2727
int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw,
2828
struct iov_iter *iter, void *ioucmd);
29+
30+
/*
31+
* Completes the request, i.e. posts an io_uring CQE and deallocates @ioucmd
32+
* and the corresponding io_uring request.
33+
*
34+
* Note: the caller should never hard code @issue_flags and is only allowed
35+
* to pass the mask provided by the core io_uring code.
36+
*/
2937
void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2,
3038
unsigned issue_flags);
39+
3140
void __io_uring_cmd_do_in_task(struct io_uring_cmd *ioucmd,
3241
void (*task_work_cb)(struct io_uring_cmd *, unsigned),
3342
unsigned flags);
3443

44+
/*
45+
* Note: the caller should never hard code @issue_flags and only use the
46+
* mask provided by the core io_uring code.
47+
*/
3548
void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
3649
unsigned int issue_flags);
3750

@@ -56,6 +69,17 @@ static inline void io_uring_cmd_mark_cancelable(struct io_uring_cmd *cmd,
5669
}
5770
#endif
5871

72+
/*
73+
* Polled completions must ensure they are coming from a poll queue, and
74+
* hence are completed inside the usual poll handling loops.
75+
*/
76+
static inline void io_uring_cmd_iopoll_done(struct io_uring_cmd *ioucmd,
77+
ssize_t ret, ssize_t res2)
78+
{
79+
lockdep_assert(in_task());
80+
io_uring_cmd_done(ioucmd, ret, res2, 0);
81+
}
82+
5983
/* users must follow the IOU_F_TWQ_LAZY_WAKE semantics */
6084
static inline void io_uring_cmd_do_in_task_lazy(struct io_uring_cmd *ioucmd,
6185
void (*task_work_cb)(struct io_uring_cmd *, unsigned))

include/linux/io_uring/net.h

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
/* SPDX-License-Identifier: GPL-2.0-or-later */
2+
#ifndef _LINUX_IO_URING_NET_H
3+
#define _LINUX_IO_URING_NET_H
4+
5+
struct io_uring_cmd;
6+
7+
#if defined(CONFIG_IO_URING)
8+
int io_uring_cmd_sock(struct io_uring_cmd *cmd, unsigned int issue_flags);
9+
10+
#else
11+
static inline int io_uring_cmd_sock(struct io_uring_cmd *cmd,
12+
unsigned int issue_flags)
13+
{
14+
return -EOPNOTSUPP;
15+
}
16+
#endif
17+
18+
#endif

0 commit comments

Comments
 (0)