Skip to content

Commit 9a82cdc

Browse files
committed
Daniel Borkmann says: ==================== pull-request: bpf-next 2023-04-21 We've added 71 non-merge commits during the last 8 day(s) which contain a total of 116 files changed, 13397 insertions(+), 8896 deletions(-). The main changes are: 1) Add a new BPF netfilter program type and minimal support to hook BPF programs to netfilter hooks such as prerouting or forward, from Florian Westphal. 2) Fix race between btf_put and btf_idr walk which caused a deadlock, from Alexei Starovoitov. 3) Second big batch to migrate test_verifier unit tests into test_progs for ease of readability and debugging, from Eduard Zingerman. 4) Add support for refcounted local kptrs to the verifier for allowing shared ownership, useful for adding a node to both the BPF list and rbtree, from Dave Marchevsky. 5) Migrate bpf_for(), bpf_for_each() and bpf_repeat() macros from BPF selftests into libbpf-provided bpf_helpers.h header and improve kfunc handling, from Andrii Nakryiko. 6) Support 64-bit pointers to kfuncs needed for archs like s390x, from Ilya Leoshkevich. 7) Support BPF progs under getsockopt with a NULL optval, from Stanislav Fomichev. 8) Improve verifier u32 scalar equality checking in order to enable LLVM transformations which earlier had to be disabled specifically for BPF backend, from Yonghong Song. 9) Extend bpftool's struct_ops object loading to support links, from Kui-Feng Lee. 10) Add xsk selftest follow-up fixes for hugepage allocated umem, from Magnus Karlsson. 11) Support BPF redirects from tc BPF to ifb devices, from Daniel Borkmann. 12) Add BPF support for integer type when accessing variable length arrays, from Feng Zhou. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (71 commits) selftests/bpf: verifier/value_ptr_arith converted to inline assembly selftests/bpf: verifier/value_illegal_alu converted to inline assembly selftests/bpf: verifier/unpriv converted to inline assembly selftests/bpf: verifier/subreg converted to inline assembly selftests/bpf: verifier/spin_lock converted to inline assembly selftests/bpf: verifier/sock converted to inline assembly selftests/bpf: verifier/search_pruning converted to inline assembly selftests/bpf: verifier/runtime_jit converted to inline assembly selftests/bpf: verifier/regalloc converted to inline assembly selftests/bpf: verifier/ref_tracking converted to inline assembly selftests/bpf: verifier/map_ptr_mixing converted to inline assembly selftests/bpf: verifier/map_in_map converted to inline assembly selftests/bpf: verifier/lwt converted to inline assembly selftests/bpf: verifier/loops1 converted to inline assembly selftests/bpf: verifier/jeq_infer_not_null converted to inline assembly selftests/bpf: verifier/direct_packet_access converted to inline assembly selftests/bpf: verifier/d_path converted to inline assembly selftests/bpf: verifier/ctx converted to inline assembly selftests/bpf: verifier/btf_ctx_access converted to inline assembly selftests/bpf: verifier/bpf_get_stack converted to inline assembly ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2 parents 418a730 + 4db10a8 commit 9a82cdc

File tree

116 files changed

+13397
-8896
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

116 files changed

+13397
-8896
lines changed

Documentation/bpf/kfuncs.rst

Lines changed: 6 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -184,16 +184,7 @@ in. All copies of the pointer being released are invalidated as a result of
184184
invoking kfunc with this flag. KF_RELEASE kfuncs automatically receive the
185185
protection afforded by the KF_TRUSTED_ARGS flag described below.
186186

187-
2.4.4 KF_KPTR_GET flag
188-
----------------------
189-
190-
The KF_KPTR_GET flag is used to indicate that the kfunc takes the first argument
191-
as a pointer to kptr, safely increments the refcount of the object it points to,
192-
and returns a reference to the user. The rest of the arguments may be normal
193-
arguments of a kfunc. The KF_KPTR_GET flag should be used in conjunction with
194-
KF_ACQUIRE and KF_RET_NULL flags.
195-
196-
2.4.5 KF_TRUSTED_ARGS flag
187+
2.4.4 KF_TRUSTED_ARGS flag
197188
--------------------------
198189

199190
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
@@ -205,7 +196,7 @@ exception described below).
205196
There are two types of pointers to kernel objects which are considered "valid":
206197

207198
1. Pointers which are passed as tracepoint or struct_ops callback arguments.
208-
2. Pointers which were returned from a KF_ACQUIRE or KF_KPTR_GET kfunc.
199+
2. Pointers which were returned from a KF_ACQUIRE kfunc.
209200

210201
Pointers to non-BTF objects (e.g. scalar pointers) may also be passed to
211202
KF_TRUSTED_ARGS kfuncs, and may have a non-zero offset.
@@ -232,13 +223,13 @@ In other words, you must:
232223
2. Specify the type and name of the trusted nested field. This field must match
233224
the field in the original type definition exactly.
234225

235-
2.4.6 KF_SLEEPABLE flag
226+
2.4.5 KF_SLEEPABLE flag
236227
-----------------------
237228

238229
The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
239230
be called by sleepable BPF programs (BPF_F_SLEEPABLE).
240231

241-
2.4.7 KF_DESTRUCTIVE flag
232+
2.4.6 KF_DESTRUCTIVE flag
242233
--------------------------
243234

244235
The KF_DESTRUCTIVE flag is used to indicate functions calling which is
@@ -247,7 +238,7 @@ rebooting or panicking. Due to this additional restrictions apply to these
247238
calls. At the moment they only require CAP_SYS_BOOT capability, but more can be
248239
added later.
249240

250-
2.4.8 KF_RCU flag
241+
2.4.7 KF_RCU flag
251242
-----------------
252243

253244
The KF_RCU flag is a weaker version of KF_TRUSTED_ARGS. The kfuncs marked with
@@ -260,7 +251,7 @@ also be KF_RET_NULL.
260251

261252
.. _KF_deprecated_flag:
262253

263-
2.4.9 KF_DEPRECATED flag
254+
2.4.8 KF_DEPRECATED flag
264255
------------------------
265256

266257
The KF_DEPRECATED flag is used for kfuncs which are scheduled to be

arch/s390/net/bpf_jit_comp.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2001,6 +2001,11 @@ bool bpf_jit_supports_kfunc_call(void)
20012001
return true;
20022002
}
20032003

2004+
bool bpf_jit_supports_far_kfunc_call(void)
2005+
{
2006+
return true;
2007+
}
2008+
20042009
int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t,
20052010
void *old_addr, void *new_addr)
20062011
{

include/linux/bpf.h

Lines changed: 68 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -187,6 +187,7 @@ enum btf_field_type {
187187
BPF_RB_NODE = (1 << 7),
188188
BPF_GRAPH_NODE_OR_ROOT = BPF_LIST_NODE | BPF_LIST_HEAD |
189189
BPF_RB_NODE | BPF_RB_ROOT,
190+
BPF_REFCOUNT = (1 << 8),
190191
};
191192

192193
typedef void (*btf_dtor_kfunc_t)(void *);
@@ -210,6 +211,7 @@ struct btf_field_graph_root {
210211

211212
struct btf_field {
212213
u32 offset;
214+
u32 size;
213215
enum btf_field_type type;
214216
union {
215217
struct btf_field_kptr kptr;
@@ -222,15 +224,10 @@ struct btf_record {
222224
u32 field_mask;
223225
int spin_lock_off;
224226
int timer_off;
227+
int refcount_off;
225228
struct btf_field fields[];
226229
};
227230

228-
struct btf_field_offs {
229-
u32 cnt;
230-
u32 field_off[BTF_FIELDS_MAX];
231-
u8 field_sz[BTF_FIELDS_MAX];
232-
};
233-
234231
struct bpf_map {
235232
/* The first two cachelines with read-mostly members of which some
236233
* are also accessed in fast-path (e.g. ops, max_entries).
@@ -257,7 +254,6 @@ struct bpf_map {
257254
struct obj_cgroup *objcg;
258255
#endif
259256
char name[BPF_OBJ_NAME_LEN];
260-
struct btf_field_offs *field_offs;
261257
/* The 3rd and 4th cacheline with misc members to avoid false sharing
262258
* particularly with refcounting.
263259
*/
@@ -299,6 +295,8 @@ static inline const char *btf_field_type_name(enum btf_field_type type)
299295
return "bpf_rb_root";
300296
case BPF_RB_NODE:
301297
return "bpf_rb_node";
298+
case BPF_REFCOUNT:
299+
return "bpf_refcount";
302300
default:
303301
WARN_ON_ONCE(1);
304302
return "unknown";
@@ -323,6 +321,8 @@ static inline u32 btf_field_type_size(enum btf_field_type type)
323321
return sizeof(struct bpf_rb_root);
324322
case BPF_RB_NODE:
325323
return sizeof(struct bpf_rb_node);
324+
case BPF_REFCOUNT:
325+
return sizeof(struct bpf_refcount);
326326
default:
327327
WARN_ON_ONCE(1);
328328
return 0;
@@ -347,27 +347,57 @@ static inline u32 btf_field_type_align(enum btf_field_type type)
347347
return __alignof__(struct bpf_rb_root);
348348
case BPF_RB_NODE:
349349
return __alignof__(struct bpf_rb_node);
350+
case BPF_REFCOUNT:
351+
return __alignof__(struct bpf_refcount);
350352
default:
351353
WARN_ON_ONCE(1);
352354
return 0;
353355
}
354356
}
355357

358+
static inline void bpf_obj_init_field(const struct btf_field *field, void *addr)
359+
{
360+
memset(addr, 0, field->size);
361+
362+
switch (field->type) {
363+
case BPF_REFCOUNT:
364+
refcount_set((refcount_t *)addr, 1);
365+
break;
366+
case BPF_RB_NODE:
367+
RB_CLEAR_NODE((struct rb_node *)addr);
368+
break;
369+
case BPF_LIST_HEAD:
370+
case BPF_LIST_NODE:
371+
INIT_LIST_HEAD((struct list_head *)addr);
372+
break;
373+
case BPF_RB_ROOT:
374+
/* RB_ROOT_CACHED 0-inits, no need to do anything after memset */
375+
case BPF_SPIN_LOCK:
376+
case BPF_TIMER:
377+
case BPF_KPTR_UNREF:
378+
case BPF_KPTR_REF:
379+
break;
380+
default:
381+
WARN_ON_ONCE(1);
382+
return;
383+
}
384+
}
385+
356386
static inline bool btf_record_has_field(const struct btf_record *rec, enum btf_field_type type)
357387
{
358388
if (IS_ERR_OR_NULL(rec))
359389
return false;
360390
return rec->field_mask & type;
361391
}
362392

363-
static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
393+
static inline void bpf_obj_init(const struct btf_record *rec, void *obj)
364394
{
365395
int i;
366396

367-
if (!foffs)
397+
if (IS_ERR_OR_NULL(rec))
368398
return;
369-
for (i = 0; i < foffs->cnt; i++)
370-
memset(obj + foffs->field_off[i], 0, foffs->field_sz[i]);
399+
for (i = 0; i < rec->cnt; i++)
400+
bpf_obj_init_field(&rec->fields[i], obj + rec->fields[i].offset);
371401
}
372402

373403
/* 'dst' must be a temporary buffer and should not point to memory that is being
@@ -379,7 +409,7 @@ static inline void bpf_obj_init(const struct btf_field_offs *foffs, void *obj)
379409
*/
380410
static inline void check_and_init_map_value(struct bpf_map *map, void *dst)
381411
{
382-
bpf_obj_init(map->field_offs, dst);
412+
bpf_obj_init(map->record, dst);
383413
}
384414

385415
/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and
@@ -399,64 +429,64 @@ static inline void bpf_long_memcpy(void *dst, const void *src, u32 size)
399429
}
400430

401431
/* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */
402-
static inline void bpf_obj_memcpy(struct btf_field_offs *foffs,
432+
static inline void bpf_obj_memcpy(struct btf_record *rec,
403433
void *dst, void *src, u32 size,
404434
bool long_memcpy)
405435
{
406436
u32 curr_off = 0;
407437
int i;
408438

409-
if (likely(!foffs)) {
439+
if (IS_ERR_OR_NULL(rec)) {
410440
if (long_memcpy)
411441
bpf_long_memcpy(dst, src, round_up(size, 8));
412442
else
413443
memcpy(dst, src, size);
414444
return;
415445
}
416446

417-
for (i = 0; i < foffs->cnt; i++) {
418-
u32 next_off = foffs->field_off[i];
447+
for (i = 0; i < rec->cnt; i++) {
448+
u32 next_off = rec->fields[i].offset;
419449
u32 sz = next_off - curr_off;
420450

421451
memcpy(dst + curr_off, src + curr_off, sz);
422-
curr_off += foffs->field_sz[i] + sz;
452+
curr_off += rec->fields[i].size + sz;
423453
}
424454
memcpy(dst + curr_off, src + curr_off, size - curr_off);
425455
}
426456

427457
static inline void copy_map_value(struct bpf_map *map, void *dst, void *src)
428458
{
429-
bpf_obj_memcpy(map->field_offs, dst, src, map->value_size, false);
459+
bpf_obj_memcpy(map->record, dst, src, map->value_size, false);
430460
}
431461

432462
static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src)
433463
{
434-
bpf_obj_memcpy(map->field_offs, dst, src, map->value_size, true);
464+
bpf_obj_memcpy(map->record, dst, src, map->value_size, true);
435465
}
436466

437-
static inline void bpf_obj_memzero(struct btf_field_offs *foffs, void *dst, u32 size)
467+
static inline void bpf_obj_memzero(struct btf_record *rec, void *dst, u32 size)
438468
{
439469
u32 curr_off = 0;
440470
int i;
441471

442-
if (likely(!foffs)) {
472+
if (IS_ERR_OR_NULL(rec)) {
443473
memset(dst, 0, size);
444474
return;
445475
}
446476

447-
for (i = 0; i < foffs->cnt; i++) {
448-
u32 next_off = foffs->field_off[i];
477+
for (i = 0; i < rec->cnt; i++) {
478+
u32 next_off = rec->fields[i].offset;
449479
u32 sz = next_off - curr_off;
450480

451481
memset(dst + curr_off, 0, sz);
452-
curr_off += foffs->field_sz[i] + sz;
482+
curr_off += rec->fields[i].size + sz;
453483
}
454484
memset(dst + curr_off, 0, size - curr_off);
455485
}
456486

457487
static inline void zero_map_value(struct bpf_map *map, void *dst)
458488
{
459-
bpf_obj_memzero(map->field_offs, dst, map->value_size);
489+
bpf_obj_memzero(map->record, dst, map->value_size);
460490
}
461491

462492
void copy_map_value_locked(struct bpf_map *map, void *dst, void *src,
@@ -2234,6 +2264,9 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
22342264
int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog,
22352265
const union bpf_attr *kattr,
22362266
union bpf_attr __user *uattr);
2267+
int bpf_prog_test_run_nf(struct bpf_prog *prog,
2268+
const union bpf_attr *kattr,
2269+
union bpf_attr __user *uattr);
22372270
bool btf_ctx_access(int off, int size, enum bpf_access_type type,
22382271
const struct bpf_prog *prog,
22392272
struct bpf_insn_access_aux *info);
@@ -2295,6 +2328,9 @@ bool bpf_prog_has_kfunc_call(const struct bpf_prog *prog);
22952328
const struct btf_func_model *
22962329
bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
22972330
const struct bpf_insn *insn);
2331+
int bpf_get_kfunc_addr(const struct bpf_prog *prog, u32 func_id,
2332+
u16 btf_fd_idx, u8 **func_addr);
2333+
22982334
struct bpf_core_ctx {
22992335
struct bpf_verifier_log *log;
23002336
const struct btf *btf;
@@ -2545,6 +2581,13 @@ bpf_jit_find_kfunc_model(const struct bpf_prog *prog,
25452581
return NULL;
25462582
}
25472583

2584+
static inline int
2585+
bpf_get_kfunc_addr(const struct bpf_prog *prog, u32 func_id,
2586+
u16 btf_fd_idx, u8 **func_addr)
2587+
{
2588+
return -ENOTSUPP;
2589+
}
2590+
25482591
static inline bool unprivileged_ebpf_enabled(void)
25492592
{
25502593
return false;

include/linux/bpf_types.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ BPF_PROG_TYPE(BPF_PROG_TYPE_LSM, lsm,
7979
#endif
8080
BPF_PROG_TYPE(BPF_PROG_TYPE_SYSCALL, bpf_syscall,
8181
void *, void *)
82+
#ifdef CONFIG_NETFILTER
83+
BPF_PROG_TYPE(BPF_PROG_TYPE_NETFILTER, netfilter,
84+
struct bpf_nf_ctx, struct bpf_nf_ctx)
85+
#endif
8286

8387
BPF_MAP_TYPE(BPF_MAP_TYPE_ARRAY, array_map_ops)
8488
BPF_MAP_TYPE(BPF_MAP_TYPE_PERCPU_ARRAY, percpu_array_map_ops)

include/linux/bpf_verifier.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -464,7 +464,12 @@ struct bpf_insn_aux_data {
464464
*/
465465
struct bpf_loop_inline_state loop_inline_state;
466466
};
467-
u64 obj_new_size; /* remember the size of type passed to bpf_obj_new to rewrite R1 */
467+
union {
468+
/* remember the size of type passed to bpf_obj_new to rewrite R1 */
469+
u64 obj_new_size;
470+
/* remember the offset of node field within type to rewrite */
471+
u64 insert_off;
472+
};
468473
struct btf_struct_meta *kptr_struct_meta;
469474
u64 map_key_state; /* constant (32 bit) key tracking for maps */
470475
int ctx_field_size; /* the ctx field size for load insn, maybe 0 */

include/linux/btf.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@
1818
#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */
1919
#define KF_RELEASE (1 << 1) /* kfunc is a release function */
2020
#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */
21-
#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */
2221
/* Trusted arguments are those which are guaranteed to be valid when passed to
2322
* the kfunc. It is used to enforce that pointers obtained from either acquire
2423
* kfuncs, or from the main kernel on a tracepoint or struct_ops callback
@@ -113,7 +112,6 @@ struct btf_id_dtor_kfunc {
113112
struct btf_struct_meta {
114113
u32 btf_id;
115114
struct btf_record *record;
116-
struct btf_field_offs *field_offs;
117115
};
118116

119117
struct btf_struct_metas {
@@ -207,7 +205,6 @@ int btf_find_timer(const struct btf *btf, const struct btf_type *t);
207205
struct btf_record *btf_parse_fields(const struct btf *btf, const struct btf_type *t,
208206
u32 field_mask, u32 value_size);
209207
int btf_check_and_fixup_fields(const struct btf *btf, struct btf_record *rec);
210-
struct btf_field_offs *btf_parse_field_offs(struct btf_record *rec);
211208
bool btf_type_is_void(const struct btf_type *t);
212209
s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind);
213210
const struct btf_type *btf_type_skip_modifiers(const struct btf *btf,

include/linux/filter.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -920,6 +920,7 @@ void bpf_jit_compile(struct bpf_prog *prog);
920920
bool bpf_jit_needs_zext(void);
921921
bool bpf_jit_supports_subprog_tailcalls(void);
922922
bool bpf_jit_supports_kfunc_call(void);
923+
bool bpf_jit_supports_far_kfunc_call(void);
923924
bool bpf_helper_changes_pkt_data(void *func);
924925

925926
static inline bool bpf_dump_raw_ok(const struct cred *cred)

include/linux/netfilter.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ typedef unsigned int nf_hookfn(void *priv,
8080
enum nf_hook_ops_type {
8181
NF_HOOK_OP_UNDEFINED,
8282
NF_HOOK_OP_NF_TABLES,
83+
NF_HOOK_OP_BPF,
8384
};
8485

8586
struct nf_hook_ops {

include/linux/skbuff.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5066,6 +5066,15 @@ static inline void skb_reset_redirect(struct sk_buff *skb)
50665066
skb->redirected = 0;
50675067
}
50685068

5069+
static inline void skb_set_redirected_noclear(struct sk_buff *skb,
5070+
bool from_ingress)
5071+
{
5072+
skb->redirected = 1;
5073+
#ifdef CONFIG_NET_REDIRECT
5074+
skb->from_ingress = from_ingress;
5075+
#endif
5076+
}
5077+
50695078
static inline bool skb_csum_is_sctp(struct sk_buff *skb)
50705079
{
50715080
#if IS_ENABLED(CONFIG_IP_SCTP)

0 commit comments

Comments
 (0)