Skip to content

Commit 2e7199b

Browse files
committed
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-08-04 The following pull-request contains BPF updates for your *net-next* tree. We've added 73 non-merge commits during the last 9 day(s) which contain a total of 135 files changed, 4603 insertions(+), 1013 deletions(-). The main changes are: 1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko. 2) Add BPF iterator for map elements and to iterate all BPF programs for efficient in-kernel inspection, from Yonghong Song and Alexei Starovoitov. 3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid unwinder errors, from Song Liu. 4) Allow cgroup local storage map to be shared between programs on the same cgroup. Also extend BPF selftests with coverage, from YiFei Zhu. 5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM load instructions, from Jean-Philippe Brucker. 6) Follow-up fixes on BPF socket lookup in combination with reuseport group handling. Also add related BPF selftests, from Jakub Sitnicki. 7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for socket create/release as well as bind functions, from Stanislav Fomichev. 8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct xdp_statistics, from Peilin Ye. 9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime. 10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6} fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin. 11) Fix a bpftool segfault due to missing program type name and make it more robust to prevent them in future gaps, from Quentin Monnet. 12) Consolidate cgroup helper functions across selftests and fix a v6 localhost resolver issue, from John Fastabend. ==================== Signed-off-by: David S. Miller <[email protected]>
2 parents 76769c3 + 21594c4 commit 2e7199b

File tree

135 files changed

+4603
-1013
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

135 files changed

+4603
-1013
lines changed

Documentation/bpf/index.rst

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,10 @@ BPF Documentation
55
This directory contains documentation for the BPF (Berkeley Packet
66
Filter) facility, with a focus on the extended BPF version (eBPF).
77

8-
This kernel side documentation is still work in progress. The main
8+
This kernel side documentation is still work in progress. The main
99
textual documentation is (for historical reasons) described in
10-
`Documentation/networking/filter.rst`_, which describe both classical
11-
and extended BPF instruction-set.
10+
:ref:`networking-filter`, which describe both classical and extended
11+
BPF instruction-set.
1212
The Cilium project also maintains a `BPF and XDP Reference Guide`_
1313
that goes into great technical depth about the BPF Architecture.
1414

@@ -48,6 +48,15 @@ Program types
4848
bpf_lsm
4949

5050

51+
Map types
52+
=========
53+
54+
.. toctree::
55+
:maxdepth: 1
56+
57+
map_cgroup_storage
58+
59+
5160
Testing and debugging BPF
5261
=========================
5362

@@ -59,7 +68,7 @@ Testing and debugging BPF
5968

6069

6170
.. Links:
62-
.. _Documentation/networking/filter.rst: ../networking/filter.txt
71+
.. _networking-filter: ../networking/filter.rst
6372
.. _man-pages: https://www.kernel.org/doc/man-pages/
64-
.. _bpf(2): http://man7.org/linux/man-pages/man2/bpf.2.html
65-
.. _BPF and XDP Reference Guide: http://cilium.readthedocs.io/en/latest/bpf/
73+
.. _bpf(2): https://man7.org/linux/man-pages/man2/bpf.2.html
74+
.. _BPF and XDP Reference Guide: https://docs.cilium.io/en/latest/bpf/
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
.. SPDX-License-Identifier: GPL-2.0-only
2+
.. Copyright (C) 2020 Google LLC.
3+
4+
===========================
5+
BPF_MAP_TYPE_CGROUP_STORAGE
6+
===========================
7+
8+
The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized
9+
storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that
10+
attach to cgroups; the programs are made available by the same Kconfig. The
11+
storage is identified by the cgroup the program is attached to.
12+
13+
The map provide a local storage at the cgroup that the BPF program is attached
14+
to. It provides a faster and simpler access than the general purpose hash
15+
table, which performs a hash table lookups, and requires user to track live
16+
cgroups on their own.
17+
18+
This document describes the usage and semantics of the
19+
``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in
20+
Linux 5.9 and this document will describe the differences.
21+
22+
Usage
23+
=====
24+
25+
The map uses key of type of either ``__u64 cgroup_inode_id`` or
26+
``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``::
27+
28+
struct bpf_cgroup_storage_key {
29+
__u64 cgroup_inode_id;
30+
__u32 attach_type;
31+
};
32+
33+
``cgroup_inode_id`` is the inode id of the cgroup directory.
34+
``attach_type`` is the the program's attach type.
35+
36+
Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type.
37+
When this key type is used, then all attach types of the particular cgroup and
38+
map will share the same storage. Otherwise, if the type is
39+
``struct bpf_cgroup_storage_key``, then programs of different attach types
40+
be isolated and see different storages.
41+
42+
To access the storage in a program, use ``bpf_get_local_storage``::
43+
44+
void *bpf_get_local_storage(void *map, u64 flags)
45+
46+
``flags`` is reserved for future use and must be 0.
47+
48+
There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE``
49+
can be accessed by multiple programs across different CPUs, and user should
50+
take care of synchronization by themselves. The bpf infrastructure provides
51+
``struct bpf_spin_lock`` to synchronize the storage. See
52+
``tools/testing/selftests/bpf/progs/test_spin_lock.c``.
53+
54+
Examples
55+
========
56+
57+
Usage with key type as ``struct bpf_cgroup_storage_key``::
58+
59+
#include <bpf/bpf.h>
60+
61+
struct {
62+
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
63+
__type(key, struct bpf_cgroup_storage_key);
64+
__type(value, __u32);
65+
} cgroup_storage SEC(".maps");
66+
67+
int program(struct __sk_buff *skb)
68+
{
69+
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
70+
__sync_fetch_and_add(ptr, 1);
71+
72+
return 0;
73+
}
74+
75+
Userspace accessing map declared above::
76+
77+
#include <linux/bpf.h>
78+
#include <linux/libbpf.h>
79+
80+
__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
81+
{
82+
struct bpf_cgroup_storage_key = {
83+
.cgroup_inode_id = cgrp,
84+
.attach_type = type,
85+
};
86+
__u32 value;
87+
bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);
88+
// error checking omitted
89+
return value;
90+
}
91+
92+
Alternatively, using just ``__u64 cgroup_inode_id`` as key type::
93+
94+
#include <bpf/bpf.h>
95+
96+
struct {
97+
__uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);
98+
__type(key, __u64);
99+
__type(value, __u32);
100+
} cgroup_storage SEC(".maps");
101+
102+
int program(struct __sk_buff *skb)
103+
{
104+
__u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);
105+
__sync_fetch_and_add(ptr, 1);
106+
107+
return 0;
108+
}
109+
110+
And userspace::
111+
112+
#include <linux/bpf.h>
113+
#include <linux/libbpf.h>
114+
115+
__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type)
116+
{
117+
__u32 value;
118+
bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);
119+
// error checking omitted
120+
return value;
121+
}
122+
123+
Semantics
124+
=========
125+
126+
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This
127+
per-CPU variant will have different memory regions for each CPU for each
128+
storage. The non-per-CPU will have the same memory region for each storage.
129+
130+
Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and
131+
for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded
132+
that uses the map. A program may be attached to multiple cgroups or have
133+
multiple attach types, and each attach creates a fresh zeroed storage. The
134+
storage is freed upon detach.
135+
136+
There is a one-to-one association between the map of each type (per-CPU and
137+
non-per-CPU) and the BPF program during load verification time. As a result,
138+
each map can only be used by one BPF program and each BPF program can only use
139+
one storage map of each type. Because of map can only be used by one BPF
140+
program, sharing of this cgroup's storage with other BPF programs were
141+
impossible.
142+
143+
Since Linux 5.9, storage can be shared by multiple programs. When a program is
144+
attached to a cgroup, the kernel would create a new storage only if the map
145+
does not already contain an entry for the cgroup and attach type pair, or else
146+
the old storage is reused for the new attachment. If the map is attach type
147+
shared, then attach type is simply ignored during comparison. Storage is freed
148+
only when either the map or the cgroup attached to is being freed. Detaching
149+
will not directly free the storage, but it may cause the reference to the map
150+
to reach zero and indirectly freeing all storage in the map.
151+
152+
The map is not associated with any BPF program, thus making sharing possible.
153+
However, the BPF program can still only associate with one map of each type
154+
(per-CPU and non-per-CPU). A BPF program cannot use more than one
155+
``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one
156+
``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``.
157+
158+
In all versions, userspace may use the the attach parameters of cgroup and
159+
attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map
160+
APIs to read or update the storage for a given attachment. For Linux 5.9
161+
attach type shared storages, only the first value in the struct, cgroup inode
162+
id, is used during comparison, so userspace may just specify a ``__u64``
163+
directly.
164+
165+
The storage is bound at attach time. Even if the program is attached to parent
166+
and triggers in child, the storage still belongs to the parent.
167+
168+
Userspace cannot create a new entry in the map or delete an existing entry.
169+
Program test runs always use a temporary storage.

Documentation/networking/filter.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
.. SPDX-License-Identifier: GPL-2.0
22
3+
.. _networking-filter:
4+
35
=======================================================
46
Linux Socket Filtering aka Berkeley Packet Filter (BPF)
57
=======================================================

arch/arm64/include/asm/extable.h

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,17 @@ struct exception_table_entry
2222

2323
#define ARCH_HAS_RELATIVE_EXTABLE
2424

25+
#ifdef CONFIG_BPF_JIT
26+
int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
27+
struct pt_regs *regs);
28+
#else /* !CONFIG_BPF_JIT */
29+
static inline
30+
int arm64_bpf_fixup_exception(const struct exception_table_entry *ex,
31+
struct pt_regs *regs)
32+
{
33+
return 0;
34+
}
35+
#endif /* !CONFIG_BPF_JIT */
36+
2537
extern int fixup_exception(struct pt_regs *regs);
2638
#endif

arch/arm64/mm/extable.c

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,14 @@ int fixup_exception(struct pt_regs *regs)
1111
const struct exception_table_entry *fixup;
1212

1313
fixup = search_exception_tables(instruction_pointer(regs));
14-
if (fixup)
15-
regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
14+
if (!fixup)
15+
return 0;
1616

17-
return fixup != NULL;
17+
if (IS_ENABLED(CONFIG_BPF_JIT) &&
18+
regs->pc >= BPF_JIT_REGION_START &&
19+
regs->pc < BPF_JIT_REGION_END)
20+
return arm64_bpf_fixup_exception(fixup, regs);
21+
22+
regs->pc = (unsigned long)&fixup->fixup + fixup->fixup;
23+
return 1;
1824
}

0 commit comments

Comments
 (0)