Skip to content

Commit 71b4a99

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf-standard-streams'
Kumar Kartikeya Dwivedi says: ==================== BPF Standard Streams This set introduces a standard output interface with two streams, namely stdout and stderr, for BPF programs. The idea is that these streams will be written to by BPF programs and the kernel, and serve as standard interfaces for informing user space of any BPF runtime violations. Users can also utilize them for printing normal messages for debugging usage, as is the case with bpf_printk() and trace pipe interface. BPF programs and the kernel can use these streams to output messages. User space can dump these messages using bpftool. The stream interface itself is implemented using a lockless list, so that we can queue messages from any context. Every printk statement into the stream leads to memory allocation. Allocation itself relies on try_alloc_pages() to construct a bespoke bump allocator to carve out elements. If this fails, we finally give up and drop the message. See commit logs for more details. Two scenarios are covered: - Deadlocks and timeouts in rqspinlock. - Timeouts for may_goto. In each we provide the stack trace and source information for the offending BPF programs. Both the C source line and the file and line numbers are printed. The output format is as follows: ERROR: AA or ABBA deadlock detected for bpf_res_spin_lock Attempted lock = 0xff11000108f3a5e0 Total held locks = 1 Held lock[ 0] = 0xff11000108f3a5e0 CPU: 48 UID: 0 PID: 786 Comm: test_progs Call trace: bpf_stream_stage_dump_stack+0xb0/0xd0 bpf_prog_report_rqspinlock_violation+0x10b/0x130 bpf_res_spin_lock+0x8c/0xa0 bpf_prog_3699ea119d1f6ed8_foo+0xe5/0x140 if (!bpf_res_spin_lock(&v2->lock)) @ stream_bpftool.c:62 bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0 foo(stream); @ stream_bpftool.c:93 bpf_prog_test_run_syscall+0x102/0x240 __sys_bpf+0xd68/0x2bf0 __x64_sys_bpf+0x1e/0x30 do_syscall_64+0x68/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e ERROR: Timeout detected for may_goto instruction CPU: 48 UID: 0 PID: 786 Comm: test_progs Call trace: bpf_stream_stage_dump_stack+0xb0/0xd0 bpf_prog_report_may_goto_violation+0x6a/0x90 bpf_check_timed_may_goto+0x4d/0xa0 arch_bpf_timed_may_goto+0x21/0x40 bpf_prog_3699ea119d1f6ed8_foo+0x12f/0x140 while (can_loop) @ stream_bpftool.c:71 bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0 foo(stream); @ stream_bpftool.c:93 bpf_prog_test_run_syscall+0x102/0x240 __sys_bpf+0xd68/0x2bf0 __x64_sys_bpf+0x1e/0x30 do_syscall_64+0x68/0x140 entry_SYSCALL_64_after_hwframe+0x76/0x7e Changelog: ---------- v4 -> v5 v4: https://lore.kernel.org/bpf/[email protected] * Add acks from Emil. * Address various nits. * Add extra failure tests. * Make deadlock test a little more robust to catch problems. v3 -> v4 v3: https://lore.kernel.org/bpf/[email protected] * Switch to alloc_pages_nolock(), avoid incorrect memcg accounting. (Alexei) * We will figure out proper accounting later. * Drop error limit logic, restrict stream capacity to 100,000 bytes. (Alexei) * Remove extra invocation of is_bpf_text_address(). (Jiri) * Avoid emitting NULL byte into the stream text, adjust regex in selftests. (Alexei) * Add comment around rcu_read_lock() for bpf_prog_ksym_find. (Alexei) * Tighten stream capacity check selftest. * Add acks from Andrii. v2 -> v3 v2: https://lore.kernel.org/bpf/[email protected] * Fix bug when handling single element stream stage. (Eduard) * Move to mutex for protection of stream read and copy_to_user(). (Alexei) * Split bprintf refactor into its own patch. (Alexei) * Move kfunc definition to common_btf_ids to avoid initcall proliferation. (Alexei) * Return line number by reference in bpf_prog_get_file_line. (Alexei) * Remove NULL checks for BTF name pointer. (Alexei) * Add WARN_ON_ONCE(!rcu_read_lock_held()) in bpf_prog_ksym_find. (Eduard) * Remove hardcoded stream stage from macros. (Alexei, Eduard) * Move refactoring hunks to their own patch. (Alexei) * Add empty opts parameter for future extensibility to libbpf API. (Andrii, Eduard) * Add BPF_STREAM_{STDOUT,STDERR} to UAPI. (Andrii) * Add code to match on backtrace output. (Eduard) * Fix misc nits. * Add acks. v1 -> v2 v1: https://lore.kernel.org/bpf/[email protected] * Drop arena page fault prints, will be done as follow up. (Alexei) * Defer Andrii's request to reuse code and Alan's suggestion of error counts to follow up. * Drop bpf_dynptr_from_mem_slice patch. * Drop some acks due to heavy reworking. * Fix KASAN splat in bpf_prog_get_file_line. (Eduard) * Collapse bpf_prog_ksym_find and is_bpf_text_address into single call. (Eduard) * Add missing RCU read lock in bpf_prog_ksym_find. * Fix incorrect error handling in dump_stack_cb. * Simplify libbpf macro. (Eduard, Andrii) * Introduce bpf_prog_stream_read() libbpf API. (Eduard, Alexei, Andrii) * Drop BPF prog from the bpftool, use libbpf API. * Rework selftests. RFC v1 -> v1 RFC v1: https://lore.kernel.org/bpf/[email protected] * Rebase on bpf-next/master. * Change output in dump_stack to also print source line. (Alexei) * Simplify API to single pop() operation. (Eduard, Alexei) * Add kdoc for bpf_dynptr_from_mem_slice. * Fix -EINVAL returned from prog_dump_stream. (Eduard) * Split dump_stack() patch into multiple commits. * Add macro wrapping stream staging API. * Change bpftool command from dump to tracelog. (Quentin) * Add bpftool documentation and bash completion. (Quentin) * Change license of bpftool to Dual BSD/GPL. * Simplify memory allocator. (Alexei) * No overflow into second page. * Remove bpf_mem_alloc() fallback. * Symlink bpftool BPF program and exercise as selftest. (Eduard) * Verify output after dumping from ringbuf. (Eduard) * More failure cases to check API invariants. * Remove patches for dynptr lifetime fixes (split into separate set). * Limit maximum error messages, and add stream capacity. (Eduard) ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents da7e9c0 + 5697683 commit 71b4a99

File tree

21 files changed

+1206
-24
lines changed

21 files changed

+1206
-24
lines changed

arch/x86/net/bpf_jit_comp.c

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3845,7 +3845,6 @@ void arch_bpf_stack_walk(bool (*consume_fn)(void *cookie, u64 ip, u64 sp, u64 bp
38453845
}
38463846
return;
38473847
#endif
3848-
WARN(1, "verification of programs using bpf_throw should have failed\n");
38493848
}
38503849

38513850
void bpf_arch_poke_desc_update(struct bpf_jit_poke_descriptor *poke,

include/linux/bpf.h

Lines changed: 72 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1538,6 +1538,37 @@ struct btf_mod_pair {
15381538

15391539
struct bpf_kfunc_desc_tab;
15401540

1541+
enum bpf_stream_id {
1542+
BPF_STDOUT = 1,
1543+
BPF_STDERR = 2,
1544+
};
1545+
1546+
struct bpf_stream_elem {
1547+
struct llist_node node;
1548+
int total_len;
1549+
int consumed_len;
1550+
char str[];
1551+
};
1552+
1553+
enum {
1554+
/* 100k bytes */
1555+
BPF_STREAM_MAX_CAPACITY = 100000ULL,
1556+
};
1557+
1558+
struct bpf_stream {
1559+
atomic_t capacity;
1560+
struct llist_head log; /* list of in-flight stream elements in LIFO order */
1561+
1562+
struct mutex lock; /* lock protecting backlog_{head,tail} */
1563+
struct llist_node *backlog_head; /* list of in-flight stream elements in FIFO order */
1564+
struct llist_node *backlog_tail; /* tail of the list above */
1565+
};
1566+
1567+
struct bpf_stream_stage {
1568+
struct llist_head log;
1569+
int len;
1570+
};
1571+
15411572
struct bpf_prog_aux {
15421573
atomic64_t refcnt;
15431574
u32 used_map_cnt;
@@ -1646,6 +1677,7 @@ struct bpf_prog_aux {
16461677
struct work_struct work;
16471678
struct rcu_head rcu;
16481679
};
1680+
struct bpf_stream stream[2];
16491681
};
16501682

16511683
struct bpf_prog {
@@ -2409,6 +2441,7 @@ int generic_map_delete_batch(struct bpf_map *map,
24092441
struct bpf_map *bpf_map_get_curr_or_next(u32 *id);
24102442
struct bpf_prog *bpf_prog_get_curr_or_next(u32 *id);
24112443

2444+
24122445
int bpf_map_alloc_pages(const struct bpf_map *map, int nid,
24132446
unsigned long nr_pages, struct page **page_array);
24142447
#ifdef CONFIG_MEMCG
@@ -3551,16 +3584,50 @@ bool btf_id_set_contains(const struct btf_id_set *set, u32 id);
35513584
#define MAX_BPRINTF_VARARGS 12
35523585
#define MAX_BPRINTF_BUF 1024
35533586

3587+
/* Per-cpu temp buffers used by printf-like helpers to store the bprintf binary
3588+
* arguments representation.
3589+
*/
3590+
#define MAX_BPRINTF_BIN_ARGS 512
3591+
3592+
struct bpf_bprintf_buffers {
3593+
char bin_args[MAX_BPRINTF_BIN_ARGS];
3594+
char buf[MAX_BPRINTF_BUF];
3595+
};
3596+
35543597
struct bpf_bprintf_data {
35553598
u32 *bin_args;
35563599
char *buf;
35573600
bool get_bin_args;
35583601
bool get_buf;
35593602
};
35603603

3561-
int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args,
3604+
int bpf_bprintf_prepare(const char *fmt, u32 fmt_size, const u64 *raw_args,
35623605
u32 num_args, struct bpf_bprintf_data *data);
35633606
void bpf_bprintf_cleanup(struct bpf_bprintf_data *data);
3607+
int bpf_try_get_buffers(struct bpf_bprintf_buffers **bufs);
3608+
void bpf_put_buffers(void);
3609+
3610+
void bpf_prog_stream_init(struct bpf_prog *prog);
3611+
void bpf_prog_stream_free(struct bpf_prog *prog);
3612+
int bpf_prog_stream_read(struct bpf_prog *prog, enum bpf_stream_id stream_id, void __user *buf, int len);
3613+
void bpf_stream_stage_init(struct bpf_stream_stage *ss);
3614+
void bpf_stream_stage_free(struct bpf_stream_stage *ss);
3615+
__printf(2, 3)
3616+
int bpf_stream_stage_printk(struct bpf_stream_stage *ss, const char *fmt, ...);
3617+
int bpf_stream_stage_commit(struct bpf_stream_stage *ss, struct bpf_prog *prog,
3618+
enum bpf_stream_id stream_id);
3619+
int bpf_stream_stage_dump_stack(struct bpf_stream_stage *ss);
3620+
3621+
#define bpf_stream_printk(ss, ...) bpf_stream_stage_printk(&ss, __VA_ARGS__)
3622+
#define bpf_stream_dump_stack(ss) bpf_stream_stage_dump_stack(&ss)
3623+
3624+
#define bpf_stream_stage(ss, prog, stream_id, expr) \
3625+
({ \
3626+
bpf_stream_stage_init(&ss); \
3627+
(expr); \
3628+
bpf_stream_stage_commit(&ss, prog, stream_id); \
3629+
bpf_stream_stage_free(&ss); \
3630+
})
35643631

35653632
#ifdef CONFIG_BPF_LSM
35663633
void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype);
@@ -3596,4 +3663,8 @@ static inline bool bpf_is_subprog(const struct bpf_prog *prog)
35963663
return prog->aux->func_idx != 0;
35973664
}
35983665

3666+
int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep,
3667+
const char **linep, int *nump);
3668+
struct bpf_prog *bpf_prog_find_from_stack(void);
3669+
35993670
#endif /* _LINUX_BPF_H */

include/uapi/linux/bpf.h

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -906,6 +906,17 @@ union bpf_iter_link_info {
906906
* A new file descriptor (a nonnegative integer), or -1 if an
907907
* error occurred (in which case, *errno* is set appropriately).
908908
*
909+
* BPF_PROG_STREAM_READ_BY_FD
910+
* Description
911+
* Read data of a program's BPF stream. The program is identified
912+
* by *prog_fd*, and the stream is identified by the *stream_id*.
913+
* The data is copied to a buffer pointed to by *stream_buf*, and
914+
* filled less than or equal to *stream_buf_len* bytes.
915+
*
916+
* Return
917+
* Number of bytes read from the stream on success, or -1 if an
918+
* error occurred (in which case, *errno* is set appropriately).
919+
*
909920
* NOTES
910921
* eBPF objects (maps and programs) can be shared between processes.
911922
*
@@ -961,6 +972,7 @@ enum bpf_cmd {
961972
BPF_LINK_DETACH,
962973
BPF_PROG_BIND_MAP,
963974
BPF_TOKEN_CREATE,
975+
BPF_PROG_STREAM_READ_BY_FD,
964976
__MAX_BPF_CMD,
965977
};
966978

@@ -1463,6 +1475,11 @@ struct bpf_stack_build_id {
14631475

14641476
#define BPF_OBJ_NAME_LEN 16U
14651477

1478+
enum {
1479+
BPF_STREAM_STDOUT = 1,
1480+
BPF_STREAM_STDERR = 2,
1481+
};
1482+
14661483
union bpf_attr {
14671484
struct { /* anonymous struct used by BPF_MAP_CREATE command */
14681485
__u32 map_type; /* one of enum bpf_map_type */
@@ -1849,6 +1866,13 @@ union bpf_attr {
18491866
__u32 bpffs_fd;
18501867
} token_create;
18511868

1869+
struct {
1870+
__aligned_u64 stream_buf;
1871+
__u32 stream_buf_len;
1872+
__u32 stream_id;
1873+
__u32 prog_fd;
1874+
} prog_stream_read;
1875+
18521876
} __attribute__((aligned(8)));
18531877

18541878
/* The description below is an attempt at providing documentation to eBPF

kernel/bpf/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ obj-$(CONFIG_BPF_SYSCALL) += bpf_local_storage.o bpf_task_storage.o
1414
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
1515
obj-$(CONFIG_BPF_SYSCALL) += disasm.o mprog.o
1616
obj-$(CONFIG_BPF_JIT) += trampoline.o
17-
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o
17+
obj-$(CONFIG_BPF_SYSCALL) += btf.o memalloc.o rqspinlock.o stream.o
1818
ifeq ($(CONFIG_MMU)$(CONFIG_64BIT),yy)
1919
obj-$(CONFIG_BPF_SYSCALL) += arena.o range_tree.o
2020
endif

kernel/bpf/core.c

Lines changed: 108 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,10 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
134134
mutex_init(&fp->aux->ext_mutex);
135135
mutex_init(&fp->aux->dst_mutex);
136136

137+
#ifdef CONFIG_BPF_SYSCALL
138+
bpf_prog_stream_init(fp);
139+
#endif
140+
137141
return fp;
138142
}
139143

@@ -778,7 +782,10 @@ bool is_bpf_text_address(unsigned long addr)
778782

779783
struct bpf_prog *bpf_prog_ksym_find(unsigned long addr)
780784
{
781-
struct bpf_ksym *ksym = bpf_ksym_find(addr);
785+
struct bpf_ksym *ksym;
786+
787+
WARN_ON_ONCE(!rcu_read_lock_held());
788+
ksym = bpf_ksym_find(addr);
782789

783790
return ksym && ksym->prog ?
784791
container_of(ksym, struct bpf_prog_aux, ksym)->prog :
@@ -2862,6 +2869,7 @@ static void bpf_prog_free_deferred(struct work_struct *work)
28622869
aux = container_of(work, struct bpf_prog_aux, work);
28632870
#ifdef CONFIG_BPF_SYSCALL
28642871
bpf_free_kfunc_btf_tab(aux->kfunc_btf_tab);
2872+
bpf_prog_stream_free(aux->prog);
28652873
#endif
28662874
#ifdef CONFIG_CGROUP_BPF
28672875
if (aux->cgroup_atype != CGROUP_BPF_ATTACH_TYPE_INVALID)
@@ -3160,6 +3168,22 @@ u64 __weak arch_bpf_timed_may_goto(void)
31603168
return 0;
31613169
}
31623170

3171+
static noinline void bpf_prog_report_may_goto_violation(void)
3172+
{
3173+
#ifdef CONFIG_BPF_SYSCALL
3174+
struct bpf_stream_stage ss;
3175+
struct bpf_prog *prog;
3176+
3177+
prog = bpf_prog_find_from_stack();
3178+
if (!prog)
3179+
return;
3180+
bpf_stream_stage(ss, prog, BPF_STDERR, ({
3181+
bpf_stream_printk(ss, "ERROR: Timeout detected for may_goto instruction\n");
3182+
bpf_stream_dump_stack(ss);
3183+
}));
3184+
#endif
3185+
}
3186+
31633187
u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
31643188
{
31653189
u64 time = ktime_get_mono_fast_ns();
@@ -3170,8 +3194,10 @@ u64 bpf_check_timed_may_goto(struct bpf_timed_may_goto *p)
31703194
return BPF_MAX_TIMED_LOOPS;
31713195
}
31723196
/* Check if we've exhausted our time slice, and zero count. */
3173-
if (time - p->timestamp >= (NSEC_PER_SEC / 4))
3197+
if (unlikely(time - p->timestamp >= (NSEC_PER_SEC / 4))) {
3198+
bpf_prog_report_may_goto_violation();
31743199
return 0;
3200+
}
31753201
/* Refresh the count for the stack frame. */
31763202
return BPF_MAX_TIMED_LOOPS;
31773203
}
@@ -3208,3 +3234,83 @@ EXPORT_SYMBOL(bpf_stats_enabled_key);
32083234

32093235
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_exception);
32103236
EXPORT_TRACEPOINT_SYMBOL_GPL(xdp_bulk_tx);
3237+
3238+
#ifdef CONFIG_BPF_SYSCALL
3239+
3240+
int bpf_prog_get_file_line(struct bpf_prog *prog, unsigned long ip, const char **filep,
3241+
const char **linep, int *nump)
3242+
{
3243+
int idx = -1, insn_start, insn_end, len;
3244+
struct bpf_line_info *linfo;
3245+
void **jited_linfo;
3246+
struct btf *btf;
3247+
3248+
btf = prog->aux->btf;
3249+
linfo = prog->aux->linfo;
3250+
jited_linfo = prog->aux->jited_linfo;
3251+
3252+
if (!btf || !linfo || !jited_linfo)
3253+
return -EINVAL;
3254+
len = prog->aux->func ? prog->aux->func[prog->aux->func_idx]->len : prog->len;
3255+
3256+
linfo = &prog->aux->linfo[prog->aux->linfo_idx];
3257+
jited_linfo = &prog->aux->jited_linfo[prog->aux->linfo_idx];
3258+
3259+
insn_start = linfo[0].insn_off;
3260+
insn_end = insn_start + len;
3261+
3262+
for (int i = 0; i < prog->aux->nr_linfo &&
3263+
linfo[i].insn_off >= insn_start && linfo[i].insn_off < insn_end; i++) {
3264+
if (jited_linfo[i] >= (void *)ip)
3265+
break;
3266+
idx = i;
3267+
}
3268+
3269+
if (idx == -1)
3270+
return -ENOENT;
3271+
3272+
/* Get base component of the file path. */
3273+
*filep = btf_name_by_offset(btf, linfo[idx].file_name_off);
3274+
*filep = kbasename(*filep);
3275+
/* Obtain the source line, and strip whitespace in prefix. */
3276+
*linep = btf_name_by_offset(btf, linfo[idx].line_off);
3277+
while (isspace(**linep))
3278+
*linep += 1;
3279+
*nump = BPF_LINE_INFO_LINE_NUM(linfo[idx].line_col);
3280+
return 0;
3281+
}
3282+
3283+
struct walk_stack_ctx {
3284+
struct bpf_prog *prog;
3285+
};
3286+
3287+
static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
3288+
{
3289+
struct walk_stack_ctx *ctxp = cookie;
3290+
struct bpf_prog *prog;
3291+
3292+
/*
3293+
* The RCU read lock is held to safely traverse the latch tree, but we
3294+
* don't need its protection when accessing the prog, since it has an
3295+
* active stack frame on the current stack trace, and won't disappear.
3296+
*/
3297+
rcu_read_lock();
3298+
prog = bpf_prog_ksym_find(ip);
3299+
rcu_read_unlock();
3300+
if (!prog)
3301+
return true;
3302+
if (bpf_is_subprog(prog))
3303+
return true;
3304+
ctxp->prog = prog;
3305+
return false;
3306+
}
3307+
3308+
struct bpf_prog *bpf_prog_find_from_stack(void)
3309+
{
3310+
struct walk_stack_ctx ctx = {};
3311+
3312+
arch_bpf_stack_walk(find_from_stack_cb, &ctx);
3313+
return ctx.prog;
3314+
}
3315+
3316+
#endif

0 commit comments

Comments
 (0)