bpf/sockmap: add splice support for tcp_bpf by kernel-patches-daemon-bpf[bot] · Pull Request #11277 · kernel-patches/bpf

kernel-patches-daemon-bpf · 2026-03-04T06:50:36Z

Pull request for series with
subject: bpf/sockmap: add splice support for tcp_bpf
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046

kernel-patches-daemon-bpf · 2026-03-04T06:50:38Z

Upstream branch: 05c9b2e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-review-bot · 2026-03-04T07:13:27Z

commit fba0aa2abe5f462df7cd28193b8ea6a409b2881d
Author: Jiayuan Chen <jiayuan.chen@linux.dev>

tcp_bpf: refactor recvmsg with read actor abstraction

Refactors the tcp_bpf recvmsg paths to use a function pointer callback
(sk_msg_read_actor_t) instead of calling copy_page_to_iter() directly,
preparing for future splice support.

> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> index 2e26174c9919..6a906bfe3aa4 100644
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c

[ ... ]

> @@ -440,7 +442,8 @@ int __sk_msg_recvmsg(struct sock *sk, ...
>  			if (copy)
> -				copy = copy_page_to_iter(page, sge->offset, copy, iter);
> +				copy = actor(actor_arg, page,
> +					     sge->offset, copy);

The direct copy_page_to_iter() call is now replaced by the actor
callback here, but deeper in the same function the peek-path comment
still references copy_page_to_iter() by name:

net/core/skmsg.c:sk_msg_read_core() {
    ...
		} else {
			/* Lets not optimize peek case if copy_page_to_iter
			 * didn't copy the entire length lets just break.
			 */
			if (copy != sge->length)
				goto out;
    ...
}

Should this comment say "the actor" instead of "copy_page_to_iter"
now that the copy operation is abstracted behind the callback? Once
the splice actor is added later in this series, the reference to
copy_page_to_iter will be inaccurate for that path.

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: tcp_bpf: refactor recvmsg with read actor abstraction
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22658462977

kernel-patches-daemon-bpf · 2026-03-04T07:14:18Z

Forwarding comment 3995735291 via email
In-Reply-To: 20260304063643.14581-4-jiayuan.chen@linux.dev
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260304063643.14581-4-jiayuan.chen@linux.dev/

kernel-patches-review-bot · 2026-03-04T07:16:53Z

commit 0eb3c1a5d8b9657ba331b71bb22002d34d72bd00
Author: Jiayuan Chen <jiayuan.chen@linux.dev>

tcp_bpf: add splice_read support for sockmap

Adds splice_read support for sockmap by introducing sk_msg_splice_actor()
which copies pages into the pipe, and fixes sk_msg_read_core() to advance
sg.start when the actor fails mid-way through processing.

> diff --git a/net/core/skmsg.c b/net/core/skmsg.c
> --- a/net/core/skmsg.c
> +++ b/net/core/skmsg.c
> @@ -445,6 +445,16 @@ int sk_msg_read_core(struct sock *sk, struct sk_psock *psock,
>  				copy = actor(actor_arg, page,
>  					     sge->offset, copy);
>  			if (!copy) {
> +				/*
> +				 * The loop processes msg_rx->sg entries
> +				 * sequentially and prior entries may
> +				 * already be consumed. Advance sg.start
> +				 * so the next call resumes at the correct
> +				 * entry, otherwise it would revisit
> +				 * zero-length entries and return -EFAULT.
> +				 */
> +				if (!peek)
> +					msg_rx->sg.start = i;
>  				copied = copied ? copied : -EFAULT;
>  				goto out;

The sg.start fix looks correct for the mid-way failure case.

When the splice actor (below) fails on the very first sg entry, copied
is still 0, so this returns -EFAULT to the caller. For the recvmsg
actor, -EFAULT is the right error because copy_page_to_iter() returning
0 means a genuine fault. But for the splice actor, 0 can mean the pipe
is full (add_to_pipe() returned -EAGAIN) or memory allocation failed.

Would it make sense for sk_msg_splice_actor() to propagate the actual
error, or for the callers to translate -EFAULT into the appropriate
errno for their context? Right now the splice path returns -EFAULT to
userspace for a pipe-full condition, where standard tcp_splice_read()
would return -EAGAIN:

tcp_bpf_splice_read()
  -> __tcp_bpf_recvmsg()
    -> sk_msg_read_core() with sk_msg_splice_actor
      -> add_to_pipe() returns -EAGAIN (pipe full)
      -> actor returns 0
      -> copied = 0 ? 0 : -EFAULT
    returns -EFAULT
  returns -EFAULT to userspace

> diff --git a/net/ipv4/tcp_bpf.c b/net/ipv4/tcp_bpf.c
> --- a/net/ipv4/tcp_bpf.c
> +++ b/net/ipv4/tcp_bpf.c

[ ... ]

> +static int sk_msg_splice_actor(void *arg, struct page *page,
> +			       unsigned int offset, size_t len)
> +{
> +	struct tcp_bpf_splice_ctx *ctx = arg;
> +	struct pipe_buffer buf = {
> +		.ops = &nosteal_pipe_buf_ops,
> +	};
> +	ssize_t ret;
> +
> +	buf.page = alloc_page(GFP_KERNEL);
> +	if (!buf.page)
> +		return 0;
> +
> +	memcpy(page_address(buf.page), page_address(page) + offset, len);
> +	buf.offset = 0;
> +	buf.len = len;
> +
> +	/*
> +	 * add_to_pipe() calls pipe_buf_release() on failure, which
> +	 * handles put_page() via nosteal_pipe_buf_ops, so no explicit
> +	 * cleanup is needed here.
> +	 */
> +	ret = add_to_pipe(ctx->pipe, &buf);
> +	if (ret <= 0)
> +		return 0;
> +	return ret;
> +}

Both the alloc_page failure and the add_to_pipe failure return 0 here.
Since sk_msg_read_core() converts actor-returning-0 into -EFAULT when
no bytes have been copied yet, this means userspace can see -EFAULT for
what is actually -ENOMEM (allocation failure) or -EAGAIN (pipe full).
The alloc_page case under memory pressure seems particularly confusing,
as -EFAULT gives no indication that backing off might help.

AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

In-Reply-To-Subject: tcp_bpf: add splice_read support for sockmap
CI run summary: https://github.com/kernel-patches/bpf/actions/runs/22658462977

kernel-patches-daemon-bpf · 2026-03-04T07:27:23Z

Forwarding comment 3995747758 via email
In-Reply-To: 20260304063643.14581-5-jiayuan.chen@linux.dev
Patch: https://patchwork.kernel.org/project/netdevbpf/patch/20260304063643.14581-5-jiayuan.chen@linux.dev/

kernel-patches-daemon-bpf · 2026-03-05T01:18:14Z

Upstream branch: 4faa189
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-05T23:23:01Z

Upstream branch: 748f9c6
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-05T23:38:12Z

Upstream branch: 6dd780f
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-09T01:06:33Z

Upstream branch: 099bded
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-09T16:47:06Z

Upstream branch: bd2e02e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-10T01:22:29Z

Upstream branch: bd2e02e
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf · 2026-03-10T19:00:05Z

Upstream branch: 0c55d48
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

Add a splice_read function pointer to struct proto between recvmsg and splice_eof. Set it to tcp_splice_read in both tcp_prot and tcpv6_prot. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

…am_ops Add inet_splice_read() which dispatches to sk->sk_prot->splice_read via INDIRECT_CALL_1. Replace the direct tcp_splice_read reference in inet_stream_ops and inet6_stream_ops with inet_splice_read. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Refactor the read operation with no functional changes. tcp_bpf has two read paths: strparser and non-strparser. Currently the differences are implemented directly in their respective recvmsg functions, which works fine. However, upcoming splice support would require duplicating the same logic for both paths. To avoid this, extract the strparser-specific differences into an independent abstraction that can be reused by splice. For ingress_msg data processing, introduce a function pointer callback approach. The current implementation passes sk_msg_recvmsg_actor(), which performs copy_page_to_iter() - the same copy logic previously embedded in sk_msg_recvmsg(). This provides the extension point for future splice support, where a different actor can be plugged in. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Implement splice_read for sockmap using an always-copy approach. Each page from the psock ingress scatterlist is copied to a newly allocated page before being added to the pipe, avoiding lifetime and slab-page issues. Add sk_msg_splice_actor() which allocates a fresh page via alloc_page(), copies the data with memcpy(), then passes it to add_to_pipe(). The newly allocated page already has a refcount of 1, so no additional get_page() is needed. On add_to_pipe() failure, no explicit cleanup is needed since add_to_pipe() internally calls pipe_buf_release(). Also fix sk_msg_read_core() to update msg_rx->sg.start when the actor returns 0 mid-way through processing. The loop processes msg_rx->sg entries sequentially — if the actor fails (e.g. pipe full for splice, or user buffer fault for recvmsg), prior entries may already be consumed with sge->length set to 0. Without advancing sg.start, subsequent calls would revisit these zero-length entries and return -EFAULT. This is especially common with the splice actor since the pipe has a small fixed capacity (16 slots), but theoretically affects recvmsg as well. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

The previous splice_read implementation copies all data through intermediate pages (alloc_page + memcpy). This is wasteful for skb fragment pages which are allocated from the page allocator and can be safely referenced via get_page(). Optimize by checking PageSlab() to distinguish between linear skb data (slab-backed) and fragment pages (page allocator-backed): - For slab pages (skb linear data): copy to a page fragment via sk_page_frag, matching what linear_to_page() does in the standard TCP splice path (skb_splice_bits). get_page() is invalid on slab pages so a copy is unavoidable here. - For non-slab pages (skb frags): use get_page() directly for true zero-copy, same as skb_splice_bits does for fragments. Both paths use nosteal_pipe_buf_ops. The sk_page_frag approach is more memory-efficient than alloc_page for small linear copies, as multiple copies can share a single page fragment. Benchmark results with rx-verdict-ingress mode (loopback, 8 CPUs): splice(2) + always-copy: ~2770 MB/s (before this patch) splice(2) + zero-copy: ~4270 MB/s (after this patch, +54%) read(2): ~4292 MB/s (baseline for reference) Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Add splice_read coverage to sockmap_basic and sockmap_strp selftests. Each test suite now runs twice: once with normal recv_timeout() and once with splice-based reads, verifying that data read via splice(2) through a pipe produces identical results. A recv_timeout_with_splice() helper is added to sockmap_helpers.h that creates a temporary pipe, splices data from the socket into the pipe, then reads from the pipe into the user buffer. MSG_PEEK calls fall back to native recv since splice does not support peek. Non-TCP sockets also fall back to native recv. The splice subtests are distinguished by appending " splice" to each subtest name via a test__start_subtest macro override. ./test_progs -a sockmap_* ... Summary: 5/830 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

Add --splice option to bench_sockmap that uses splice(2) instead of read(2) in the consumer path. A global pipe is created once during setup and reused across iterations to avoid per-call pipe creation overhead. When --splice is enabled, the consumer splices data from the socket into the pipe, then reads from the pipe into the user buffer. The socket is set to O_NONBLOCK to prevent tcp_splice_read() from blocking indefinitely, as it only checks sock->file->f_flags for non-blocking mode, ignoring SPLICE_F_NONBLOCK. Also increase SO_RCVBUF to 16MB to avoid sk_psock_backlog being throttled by the default sk_rcvbuf limit, and add --verify option to optionally enable data correctness checking (disabled by default for benchmark accuracy). Benchmark results with rx-verdict-ingress mode (loopback, 8 CPUs): read(2): ~4292 MB/s splice(2) + zero-copy: ~4270 MB/s splice(2) + always-copy: ~2770 MB/s Zero-copy splice achieves near-parity with read(2), while the always-copy fallback is ~35% slower. Usage: # Steer softirqs to CPU 7 to avoid contending with the producer CPU echo 80 > /sys/class/net/lo/queues/rx-0/rps_cpus # Raise the receive buffer ceiling so the benchmark can set 16MB rcvbuf sysctl -w net.core.rmem_max=16777216 # Run the benchmark ./bench sockmap --rx-verdict-ingress --splice -c 2 -p 1 -a -d 30 Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

kernel-patches-daemon-bpf · 2026-03-11T01:11:51Z

Upstream branch: e95e85b
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1061046
version: 1

kernel-patches-daemon-bpf bot added new bpf-next V1 labels Mar 4, 2026

kernel-patches-review-bot bot added the ai-review label Mar 4, 2026

kernel-patches-daemon-bpf bot added V1-ci-fail and removed ai-review labels Mar 4, 2026

kernel-patches-review-bot bot added the ai-review label Mar 4, 2026

kernel-patches-daemon-bpf bot removed the ai-review label Mar 4, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from c7dcbca to 69a44ca Compare March 5, 2026 01:15

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 508e32a to 315e2ed Compare March 5, 2026 01:18

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 69a44ca to f264dc7 Compare March 5, 2026 23:19

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 315e2ed to 287dd82 Compare March 5, 2026 23:23

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from f264dc7 to 59120bd Compare March 5, 2026 23:35

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 287dd82 to 817393c Compare March 5, 2026 23:38

kernel-patches-daemon-bpf bot added V1-ci-pass and removed V1-ci-fail labels Mar 6, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 59120bd to 94aca0b Compare March 9, 2026 01:04

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 817393c to e72ffac Compare March 9, 2026 01:06

kernel-patches-daemon-bpf bot added V1-ci-fail and removed V1-ci-pass labels Mar 9, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 94aca0b to 980a66f Compare March 9, 2026 16:44

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from e72ffac to 6975169 Compare March 9, 2026 16:47

kernel-patches-daemon-bpf bot added V1-ci-pass and removed V1-ci-fail labels Mar 9, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 980a66f to 026b5c1 Compare March 10, 2026 01:20

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 6975169 to 045c068 Compare March 10, 2026 01:22

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from 026b5c1 to b72a510 Compare March 10, 2026 18:57

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from 045c068 to f2b2b81 Compare March 10, 2026 19:00

kernel-patches-daemon-bpf bot added V1-ci-fail and removed V1-ci-pass labels Mar 10, 2026

kernel-patches-daemon-bpf bot force-pushed the bpf-next_base branch from b72a510 to ebefa82 Compare March 11, 2026 01:10

mrpre added 7 commits March 10, 2026 18:11

net: add splice_read to struct proto and set it in tcp_prot/tcpv6_prot

33f0a30

Add a splice_read function pointer to struct proto between recvmsg and splice_eof. Set it to tcp_splice_read in both tcp_prot and tcpv6_prot. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>

kernel-patches-daemon-bpf bot force-pushed the series/1061046=>bpf-next branch from f2b2b81 to 17ac338 Compare March 11, 2026 01:11

kernel-patches-daemon-bpf bot added V1-ci-pass and removed V1-ci-fail labels Mar 11, 2026

Conversation

kernel-patches-daemon-bpf bot commented Mar 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 4, 2026

Uh oh!

kernel-patches-review-bot bot commented Mar 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 4, 2026

Uh oh!

kernel-patches-review-bot bot commented Mar 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 4, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 5, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 5, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 5, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 9, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 9, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 10, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 10, 2026

Uh oh!

kernel-patches-daemon-bpf bot commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant