Skip to content

Commit a578b54

Browse files
author
Alexei Starovoitov
committed
Merge branch 'bpf-report-arena-faults-to-bpf-streams'
Puranjay Mohan says: ==================== bpf: report arena faults to BPF streams Changes in v6->v7: v6: https://lore.kernel.org/all/[email protected]/ - Added comments about the usage of arena_reg in x86 and arm64 jits. (Alexei) - Used clear_lo32() for clearing the lower 32-bits of user_vm_start. (Alexei) - Moved update of the old tests to use __stderr to a separate commit (Eduard) - Used test__skip() in prog_tests/stream.c (Eduard) - Start a sub-test for read / write Changes in v5->v6: v5: https://lore.kernel.org/all/[email protected]/ - Introduces __stderr and __stdout for easy testing of bpf streams (Eduard) - Add more test cases for arena fault reporting (subprog and callback) - Fix main_prog_aux usage and return main_prog from find_from_stack_cb (Kumar) - Properly fix the build issue reported by kernel test robot Changes in v4->v5: v4: https://lore.kernel.org/all/[email protected]/ - Added patch 2 to introducing main_prog_aux for easier access to streams. - Fixed bug in fault handlers when arena_reg == dst_reg - Updated selftest to check test above edge case. - Added comments about the usage of barrier_var() in code and commit message. Changes in v3->v4: v3: https://lore.kernel.org/all/[email protected]/ - Fixed a build issue when CONFIG_BPF_JIT=y and # CONFIG_BPF_SYSCALL is not set Changes in v2->v3: v2: https://lore.kernel.org/all/[email protected]/ - Improved the selftest to check the exact fault address - Dropped BPF_NO_KFUNC_PROTOTYPES and bpf_arena_alloc/free_pages() usage - Rebased on bpf-next/master Changes in v1->v2: v1: https://lore.kernel.org/all/[email protected]/ - Changed variable and mask names for consistency (Yonghong) - Added Acked-by: Yonghong Song <[email protected]> on two patches This set adds the support of reporting page faults inside arena to BPF stderr stream. The reported address is the one that a user would expect to see if they pass it to bpf_printk(); Here is an example output from the stderr stream and bpf_printk() ERROR: Arena WRITE access at unmapped address 0xdeaddead0000 CPU: 9 UID: 0 PID: 502 Comm: test_progs Call trace: bpf_stream_stage_dump_stack+0xc0/0x150 bpf_prog_report_arena_violation+0x98/0xf0 ex_handler_bpf+0x5c/0x78 fixup_exception+0xf8/0x160 __do_kernel_fault+0x40/0x188 do_bad_area+0x70/0x88 do_translation_fault+0x54/0x98 do_mem_abort+0x4c/0xa8 el1_abort+0x44/0x70 el1h_64_sync_handler+0x50/0x108 el1h_64_sync+0x6c/0x70 bpf_prog_a64a9778d31b8e88_stream_arena_write_fault+0x84/0xc8 *(page) = 1; @ stream.c:100 bpf_prog_test_run_syscall+0x100/0x328 __sys_bpf+0x508/0xb98 __arm64_sys_bpf+0x2c/0x48 invoke_syscall+0x50/0x120 el0_svc_common.constprop.0+0x48/0xf8 do_el0_svc+0x28/0x40 el0_svc+0x48/0xf8 el0t_64_sync_handler+0xa0/0xe8 el0t_64_sync+0x198/0x1a0 Same address is printed by bpf_printk(): 1389.078831: bpf_trace_printk: Read Address: 0xdeaddead0000 To make this possible, some extra metadata has to be passed to the bpf exception handler, so the bpf exception handling mechanism for both x86-64 and arm64 have been improved in this set. The streams selftest has been updated to test this new feature. ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 5d87e96 + 86f2225 commit a578b54

File tree

10 files changed

+491
-112
lines changed

10 files changed

+491
-112
lines changed

arch/arm64/net/bpf_jit_comp.c

Lines changed: 63 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1066,19 +1066,53 @@ static void build_epilogue(struct jit_ctx *ctx, bool was_classic)
10661066
emit(A64_RET(A64_LR), ctx);
10671067
}
10681068

1069-
#define BPF_FIXUP_OFFSET_MASK GENMASK(26, 0)
1069+
/*
1070+
* Metadata encoding for exception handling in JITed code.
1071+
*
1072+
* Format of `fixup` field in `struct exception_table_entry`:
1073+
*
1074+
* Bit layout of `fixup` (32-bit):
1075+
*
1076+
* +-----------+--------+-----------+-----------+----------+
1077+
* | 31-27 | 26-22 | 21 | 20-16 | 15-0 |
1078+
* | | | | | |
1079+
* | FIXUP_REG | Unused | ARENA_ACC | ARENA_REG | OFFSET |
1080+
* +-----------+--------+-----------+-----------+----------+
1081+
*
1082+
* - OFFSET (16 bits): Offset used to compute address for Load/Store instruction.
1083+
* - ARENA_REG (5 bits): Register that is used to calculate the address for load/store when
1084+
* accessing the arena region.
1085+
* - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
1086+
* - FIXUP_REG (5 bits): Destination register for the load instruction (cleared on fault) or set to
1087+
* DONT_CLEAR if it is a store instruction.
1088+
*/
1089+
1090+
#define BPF_FIXUP_OFFSET_MASK GENMASK(15, 0)
1091+
#define BPF_FIXUP_ARENA_REG_MASK GENMASK(20, 16)
1092+
#define BPF_ARENA_ACCESS BIT(21)
10701093
#define BPF_FIXUP_REG_MASK GENMASK(31, 27)
10711094
#define DONT_CLEAR 5 /* Unused ARM64 register from BPF's POV */
10721095

10731096
bool ex_handler_bpf(const struct exception_table_entry *ex,
10741097
struct pt_regs *regs)
10751098
{
1076-
off_t offset = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
10771099
int dst_reg = FIELD_GET(BPF_FIXUP_REG_MASK, ex->fixup);
1100+
s16 off = FIELD_GET(BPF_FIXUP_OFFSET_MASK, ex->fixup);
1101+
int arena_reg = FIELD_GET(BPF_FIXUP_ARENA_REG_MASK, ex->fixup);
1102+
bool is_arena = !!(ex->fixup & BPF_ARENA_ACCESS);
1103+
bool is_write = (dst_reg == DONT_CLEAR);
1104+
unsigned long addr;
1105+
1106+
if (is_arena) {
1107+
addr = regs->regs[arena_reg] + off;
1108+
bpf_prog_report_arena_violation(is_write, addr, regs->pc);
1109+
}
10781110

10791111
if (dst_reg != DONT_CLEAR)
10801112
regs->regs[dst_reg] = 0;
1081-
regs->pc = (unsigned long)&ex->fixup - offset;
1113+
/* Skip the faulting instruction */
1114+
regs->pc += AARCH64_INSN_SIZE;
1115+
10821116
return true;
10831117
}
10841118

@@ -1088,7 +1122,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
10881122
int dst_reg)
10891123
{
10901124
off_t ins_offset;
1091-
off_t fixup_offset;
1125+
s16 off = insn->off;
1126+
bool is_arena;
1127+
int arena_reg;
10921128
unsigned long pc;
10931129
struct exception_table_entry *ex;
10941130

@@ -1102,6 +1138,9 @@ static int add_exception_handler(const struct bpf_insn *insn,
11021138
BPF_MODE(insn->code) != BPF_PROBE_ATOMIC)
11031139
return 0;
11041140

1141+
is_arena = (BPF_MODE(insn->code) == BPF_PROBE_MEM32) ||
1142+
(BPF_MODE(insn->code) == BPF_PROBE_ATOMIC);
1143+
11051144
if (!ctx->prog->aux->extable ||
11061145
WARN_ON_ONCE(ctx->exentry_idx >= ctx->prog->aux->num_exentries))
11071146
return -EINVAL;
@@ -1119,22 +1158,6 @@ static int add_exception_handler(const struct bpf_insn *insn,
11191158
if (WARN_ON_ONCE(ins_offset >= 0 || ins_offset < INT_MIN))
11201159
return -ERANGE;
11211160

1122-
/*
1123-
* Since the extable follows the program, the fixup offset is always
1124-
* negative and limited to BPF_JIT_REGION_SIZE. Store a positive value
1125-
* to keep things simple, and put the destination register in the upper
1126-
* bits. We don't need to worry about buildtime or runtime sort
1127-
* modifying the upper bits because the table is already sorted, and
1128-
* isn't part of the main exception table.
1129-
*
1130-
* The fixup_offset is set to the next instruction from the instruction
1131-
* that may fault. The execution will jump to this after handling the
1132-
* fault.
1133-
*/
1134-
fixup_offset = (long)&ex->fixup - (pc + AARCH64_INSN_SIZE);
1135-
if (!FIELD_FIT(BPF_FIXUP_OFFSET_MASK, fixup_offset))
1136-
return -ERANGE;
1137-
11381161
/*
11391162
* The offsets above have been calculated using the RO buffer but we
11401163
* need to use the R/W buffer for writes.
@@ -1147,8 +1170,26 @@ static int add_exception_handler(const struct bpf_insn *insn,
11471170
if (BPF_CLASS(insn->code) != BPF_LDX)
11481171
dst_reg = DONT_CLEAR;
11491172

1150-
ex->fixup = FIELD_PREP(BPF_FIXUP_OFFSET_MASK, fixup_offset) |
1151-
FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
1173+
ex->fixup = FIELD_PREP(BPF_FIXUP_REG_MASK, dst_reg);
1174+
1175+
if (is_arena) {
1176+
ex->fixup |= BPF_ARENA_ACCESS;
1177+
/*
1178+
* insn->src_reg/dst_reg holds the address in the arena region with upper 32-bits
1179+
* being zero because of a preceding addr_space_cast(r<n>, 0x0, 0x1) instruction.
1180+
* This address is adjusted with the addition of arena_vm_start (see the
1181+
* implementation of BPF_PROBE_MEM32 and BPF_PROBE_ATOMIC) before being used for the
1182+
* memory access. Pass the reg holding the unmodified 32-bit address to
1183+
* ex_handler_bpf.
1184+
*/
1185+
if (BPF_CLASS(insn->code) == BPF_LDX)
1186+
arena_reg = bpf2a64[insn->src_reg];
1187+
else
1188+
arena_reg = bpf2a64[insn->dst_reg];
1189+
1190+
ex->fixup |= FIELD_PREP(BPF_FIXUP_OFFSET_MASK, off) |
1191+
FIELD_PREP(BPF_FIXUP_ARENA_REG_MASK, arena_reg);
1192+
}
11521193

11531194
ex->type = EX_TYPE_BPF;
11541195

arch/x86/net/bpf_jit_comp.c

Lines changed: 80 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
#include <linux/netdevice.h>
99
#include <linux/filter.h>
1010
#include <linux/if_vlan.h>
11+
#include <linux/bitfield.h>
1112
#include <linux/bpf.h>
1213
#include <linux/memory.h>
1314
#include <linux/sort.h>
@@ -1388,16 +1389,67 @@ static int emit_atomic_ld_st_index(u8 **pprog, u32 atomic_op, u32 size,
13881389
return 0;
13891390
}
13901391

1392+
/*
1393+
* Metadata encoding for exception handling in JITed code.
1394+
*
1395+
* Format of `fixup` and `data` fields in `struct exception_table_entry`:
1396+
*
1397+
* Bit layout of `fixup` (32-bit):
1398+
*
1399+
* +-----------+--------+-----------+---------+----------+
1400+
* | 31 | 30-24 | 23-16 | 15-8 | 7-0 |
1401+
* | | | | | |
1402+
* | ARENA_ACC | Unused | ARENA_REG | DST_REG | INSN_LEN |
1403+
* +-----------+--------+-----------+---------+----------+
1404+
*
1405+
* - INSN_LEN (8 bits): Length of faulting insn (max x86 insn = 15 bytes (fits in 8 bits)).
1406+
* - DST_REG (8 bits): Offset of dst_reg from reg2pt_regs[] (max offset = 112 (fits in 8 bits)).
1407+
* This is set to DONT_CLEAR if the insn is a store.
1408+
* - ARENA_REG (8 bits): Offset of the register that is used to calculate the
1409+
* address for load/store when accessing the arena region.
1410+
* - ARENA_ACCESS (1 bit): This bit is set when the faulting instruction accessed the arena region.
1411+
*
1412+
* Bit layout of `data` (32-bit):
1413+
*
1414+
* +--------------+--------+--------------+
1415+
* | 31-16 | 15-8 | 7-0 |
1416+
* | | | |
1417+
* | ARENA_OFFSET | Unused | EX_TYPE_BPF |
1418+
* +--------------+--------+--------------+
1419+
*
1420+
* - ARENA_OFFSET (16 bits): Offset used to calculate the address for load/store when
1421+
* accessing the arena region.
1422+
*/
1423+
13911424
#define DONT_CLEAR 1
1425+
#define FIXUP_INSN_LEN_MASK GENMASK(7, 0)
1426+
#define FIXUP_REG_MASK GENMASK(15, 8)
1427+
#define FIXUP_ARENA_REG_MASK GENMASK(23, 16)
1428+
#define FIXUP_ARENA_ACCESS BIT(31)
1429+
#define DATA_ARENA_OFFSET_MASK GENMASK(31, 16)
13921430

13931431
bool ex_handler_bpf(const struct exception_table_entry *x, struct pt_regs *regs)
13941432
{
1395-
u32 reg = x->fixup >> 8;
1433+
u32 reg = FIELD_GET(FIXUP_REG_MASK, x->fixup);
1434+
u32 insn_len = FIELD_GET(FIXUP_INSN_LEN_MASK, x->fixup);
1435+
bool is_arena = !!(x->fixup & FIXUP_ARENA_ACCESS);
1436+
bool is_write = (reg == DONT_CLEAR);
1437+
unsigned long addr;
1438+
s16 off;
1439+
u32 arena_reg;
1440+
1441+
if (is_arena) {
1442+
arena_reg = FIELD_GET(FIXUP_ARENA_REG_MASK, x->fixup);
1443+
off = FIELD_GET(DATA_ARENA_OFFSET_MASK, x->data);
1444+
addr = *(unsigned long *)((void *)regs + arena_reg) + off;
1445+
bpf_prog_report_arena_violation(is_write, addr, regs->ip);
1446+
}
13961447

13971448
/* jump over faulting load and clear dest register */
13981449
if (reg != DONT_CLEAR)
13991450
*(unsigned long *)((void *)regs + reg) = 0;
1400-
regs->ip += x->fixup & 0xff;
1451+
regs->ip += insn_len;
1452+
14011453
return true;
14021454
}
14031455

@@ -2070,6 +2122,7 @@ st: if (is_imm8(insn->off))
20702122
{
20712123
struct exception_table_entry *ex;
20722124
u8 *_insn = image + proglen + (start_of_ldx - temp);
2125+
u32 arena_reg, fixup_reg;
20732126
s64 delta;
20742127

20752128
if (!bpf_prog->aux->extable)
@@ -2089,8 +2142,29 @@ st: if (is_imm8(insn->off))
20892142

20902143
ex->data = EX_TYPE_BPF;
20912144

2092-
ex->fixup = (prog - start_of_ldx) |
2093-
((BPF_CLASS(insn->code) == BPF_LDX ? reg2pt_regs[dst_reg] : DONT_CLEAR) << 8);
2145+
/*
2146+
* src_reg/dst_reg holds the address in the arena region with upper
2147+
* 32-bits being zero because of a preceding addr_space_cast(r<n>,
2148+
* 0x0, 0x1) instruction. This address is adjusted with the addition
2149+
* of arena_vm_start (see the implementation of BPF_PROBE_MEM32 and
2150+
* BPF_PROBE_ATOMIC) before being used for the memory access. Pass
2151+
* the reg holding the unmodified 32-bit address to
2152+
* ex_handler_bpf().
2153+
*/
2154+
if (BPF_CLASS(insn->code) == BPF_LDX) {
2155+
arena_reg = reg2pt_regs[src_reg];
2156+
fixup_reg = reg2pt_regs[dst_reg];
2157+
} else {
2158+
arena_reg = reg2pt_regs[dst_reg];
2159+
fixup_reg = DONT_CLEAR;
2160+
}
2161+
2162+
ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
2163+
FIELD_PREP(FIXUP_ARENA_REG_MASK, arena_reg) |
2164+
FIELD_PREP(FIXUP_REG_MASK, fixup_reg);
2165+
ex->fixup |= FIXUP_ARENA_ACCESS;
2166+
2167+
ex->data |= FIELD_PREP(DATA_ARENA_OFFSET_MASK, insn->off);
20942168
}
20952169
break;
20962170

@@ -2208,7 +2282,8 @@ st: if (is_imm8(insn->off))
22082282
* End result: x86 insn "mov rbx, qword ptr [rax+0x14]"
22092283
* of 4 bytes will be ignored and rbx will be zero inited.
22102284
*/
2211-
ex->fixup = (prog - start_of_ldx) | (reg2pt_regs[dst_reg] << 8);
2285+
ex->fixup = FIELD_PREP(FIXUP_INSN_LEN_MASK, prog - start_of_ldx) |
2286+
FIELD_PREP(FIXUP_REG_MASK, reg2pt_regs[dst_reg]);
22122287
}
22132288
break;
22142289

include/linux/bpf.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1633,6 +1633,7 @@ struct bpf_prog_aux {
16331633
/* function name for valid attach_btf_id */
16341634
const char *attach_func_name;
16351635
struct bpf_prog **func;
1636+
struct bpf_prog_aux *main_prog_aux;
16361637
void *jit_data; /* JIT specific data. arch dependent */
16371638
struct bpf_jit_poke_descriptor *poke_tab;
16381639
struct bpf_kfunc_desc_tab *kfunc_tab;
@@ -2880,6 +2881,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data,
28802881
enum bpf_dynptr_type type, u32 offset, u32 size);
28812882
void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
28822883
void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
2884+
void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
28832885

28842886
#else /* !CONFIG_BPF_SYSCALL */
28852887
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
@@ -3167,6 +3169,11 @@ static inline void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr)
31673169
static inline void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr)
31683170
{
31693171
}
3172+
3173+
static inline void bpf_prog_report_arena_violation(bool write, unsigned long addr,
3174+
unsigned long fault_ip)
3175+
{
3176+
}
31703177
#endif /* CONFIG_BPF_SYSCALL */
31713178

31723179
static __always_inline int

kernel/bpf/arena.c

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -633,3 +633,33 @@ static int __init kfunc_init(void)
633633
return register_btf_kfunc_id_set(BPF_PROG_TYPE_UNSPEC, &common_kfunc_set);
634634
}
635635
late_initcall(kfunc_init);
636+
637+
void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip)
638+
{
639+
struct bpf_stream_stage ss;
640+
struct bpf_prog *prog;
641+
u64 user_vm_start;
642+
643+
/*
644+
* The RCU read lock is held to safely traverse the latch tree, but we
645+
* don't need its protection when accessing the prog, since it will not
646+
* disappear while we are handling the fault.
647+
*/
648+
rcu_read_lock();
649+
prog = bpf_prog_ksym_find(fault_ip);
650+
rcu_read_unlock();
651+
if (!prog)
652+
return;
653+
654+
/* Use main prog for stream access */
655+
prog = prog->aux->main_prog_aux->prog;
656+
657+
user_vm_start = bpf_arena_get_user_vm_start(prog->aux->arena);
658+
addr += clear_lo32(user_vm_start);
659+
660+
bpf_stream_stage(ss, prog, BPF_STDERR, ({
661+
bpf_stream_printk(ss, "ERROR: Arena %s access at unmapped address 0x%lx\n",
662+
write ? "WRITE" : "READ", addr);
663+
bpf_stream_dump_stack(ss);
664+
}));
665+
}

kernel/bpf/core.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,7 @@ struct bpf_prog *bpf_prog_alloc_no_stats(unsigned int size, gfp_t gfp_extra_flag
120120

121121
fp->pages = size / PAGE_SIZE;
122122
fp->aux = aux;
123+
fp->aux->main_prog_aux = aux;
123124
fp->aux->prog = fp;
124125
fp->jit_requested = ebpf_jit_enabled();
125126
fp->blinding_requested = bpf_jit_blinding_enabled(fp);
@@ -3297,9 +3298,8 @@ static bool find_from_stack_cb(void *cookie, u64 ip, u64 sp, u64 bp)
32973298
rcu_read_unlock();
32983299
if (!prog)
32993300
return true;
3300-
if (bpf_is_subprog(prog))
3301-
return true;
3302-
ctxp->prog = prog;
3301+
/* Make sure we return the main prog if we found a subprog */
3302+
ctxp->prog = prog->aux->main_prog_aux->prog;
33033303
return false;
33043304
}
33053305

kernel/bpf/verifier.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21601,6 +21601,7 @@ static int jit_subprogs(struct bpf_verifier_env *env)
2160121601
func[i]->aux->func_info_cnt = prog->aux->func_info_cnt;
2160221602
func[i]->aux->poke_tab = prog->aux->poke_tab;
2160321603
func[i]->aux->size_poke_tab = prog->aux->size_poke_tab;
21604+
func[i]->aux->main_prog_aux = prog->aux;
2160421605

2160521606
for (j = 0; j < prog->aux->size_poke_tab; j++) {
2160621607
struct bpf_jit_poke_descriptor *poke;

0 commit comments

Comments
 (0)