Commit f87b8fe
bpf: Introduce SK_BPF_BYPASS_PROT_MEM.
If a socket has sk->sk_bypass_prot_mem flagged, the socket opts out
of the global protocol memory accounting.
This is easily controlled by net.core.bypass_prot_mem sysctl, but it
lacks flexibility.
Let's support flagging (and clearing) sk->sk_bypass_prot_mem via
bpf_setsockopt() at the BPF_CGROUP_INET_SOCK_CREATE hook.
int val = 1;
bpf_setsockopt(ctx, SOL_SOCKET, SK_BPF_BYPASS_PROT_MEM,
&val, sizeof(val));
As with net.core.bypass_prot_mem, this is inherited to child sockets,
and BPF always takes precedence over sysctl at socket(2) and accept(2).
SK_BPF_BYPASS_PROT_MEM is only supported at BPF_CGROUP_INET_SOCK_CREATE
and not supported on other hooks for some reasons:
1. UDP charges memory under sk->sk_receive_queue.lock instead
of lock_sock()
2. Modifying the flag after skb is charged to sk requires such
adjustment during bpf_setsockopt() and complicates the logic
unnecessarily
We can support other hooks later if a real use case justifies that.
Most changes are inline and hard to trace, but a microbenchmark on
__sk_mem_raise_allocated() during neper/tcp_stream showed that more
samples completed faster with sk->sk_bypass_prot_mem == 1. This will
be more visible under tcp_mem pressure (but it's not a fair comparison).
# bpftrace -e 'kprobe:__sk_mem_raise_allocated { @start[tid] = nsecs; }
kretprobe:__sk_mem_raise_allocated /@start[tid]/
{ @EnD[tid] = nsecs - @start[tid]; @times = hist(@EnD[tid]); delete(@start[tid]); }'
# tcp_stream -6 -F 1000 -N -T 256
Without bpf prog:
[128, 256) 3846 | |
[256, 512) 1505326 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1371006 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 198207 |@@@@@@ |
[2K, 4K) 31199 |@ |
With bpf prog in the next patch:
(must be attached before tcp_stream)
# bpftool prog load sk_bypass_prot_mem.bpf.o /sys/fs/bpf/test type cgroup/sock_create
# bpftool cgroup attach /sys/fs/cgroup/test cgroup_inet_sock_create pinned /sys/fs/bpf/test
[128, 256) 6413 | |
[256, 512) 1868425 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
[512, 1K) 1101697 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |
[1K, 2K) 117031 |@@@@ |
[2K, 4K) 11773 | |
Signed-off-by: Kuniyuki Iwashima <[email protected]>1 parent b50085c commit f87b8fe
3 files changed
+34
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7200 | 7200 | | |
7201 | 7201 | | |
7202 | 7202 | | |
| 7203 | + | |
| 7204 | + | |
7203 | 7205 | | |
7204 | 7206 | | |
7205 | 7207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5731 | 5731 | | |
5732 | 5732 | | |
5733 | 5733 | | |
| 5734 | + | |
| 5735 | + | |
| 5736 | + | |
| 5737 | + | |
| 5738 | + | |
| 5739 | + | |
| 5740 | + | |
| 5741 | + | |
| 5742 | + | |
| 5743 | + | |
| 5744 | + | |
| 5745 | + | |
| 5746 | + | |
| 5747 | + | |
| 5748 | + | |
| 5749 | + | |
| 5750 | + | |
| 5751 | + | |
| 5752 | + | |
| 5753 | + | |
| 5754 | + | |
| 5755 | + | |
| 5756 | + | |
| 5757 | + | |
| 5758 | + | |
5734 | 5759 | | |
5735 | 5760 | | |
5736 | 5761 | | |
| 5762 | + | |
| 5763 | + | |
| 5764 | + | |
5737 | 5765 | | |
5738 | 5766 | | |
5739 | 5767 | | |
| |||
5751 | 5779 | | |
5752 | 5780 | | |
5753 | 5781 | | |
| 5782 | + | |
| 5783 | + | |
| 5784 | + | |
5754 | 5785 | | |
5755 | 5786 | | |
5756 | 5787 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7200 | 7200 | | |
7201 | 7201 | | |
7202 | 7202 | | |
| 7203 | + | |
7203 | 7204 | | |
7204 | 7205 | | |
7205 | 7206 | | |
| |||
0 commit comments