-
Notifications
You must be signed in to change notification settings - Fork 84
Description
Hi
Context
We are currently using aws eks ami 1.33 version(al2023_arm64) and we are running parca agent on the machine.
The problem
On Amazon Linux 2023 (aarch64, Graviton) a kernel Oops occurs while Parca is running. The kernel call trace shows perf → bpf overflow → bpf_prog_native_tracer_e → bpf_get_stackid_pe → arch_stack_walk and a level-3 translation fault (unmapped PTE). bpftool shows a live perf_event BPF program named native_tracer and stack maps. Conclusion: Parca’s native/perf eBPF tracer triggers kernel stack unwinding (arch_stack_walk) in an IRQ/softirq context and that unwinding leads to dereferencing an unmapped kernel address → Oops.
Here is the logs
Crash log (relevant excerpt)
[ 8097.188177] Unable to handle kernel paging request at virtual address ffff8000acecc0b0
[ 8097.189406] ESR = 0x0000000096000007
[ 8097.191161] FSC = 0x07: level 3 translation fault
[ 8097.194532] [ffff8000acecc0b0] pgd=10000000433e3003,..., pte=0000000000000000
[ 8097.195878] Internal error: Oops: 0000000096000007 [#1] SMP
...
[ 8097.216574] arch_stack_walk+0x218/0x5a0 (P)
[ 8097.217060] perf_callchain_kernel+0x48/0x60 (P)
[ 8097.217581] get_perf_callchain+0xa0/0x260 (P)
[ 8097.218085] bpf_get_stackid+0x7c/0xc8 (P)
[ 8097.218543] bpf_get_stackid_pe+0xec/0x128 (P)
[ 8097.219054] bpf_prog_a6db16dc005b1a9f_native_tracer_e+0x1dc/0xc00 (P)
[ 8097.219754] bpf_overflow_handler+0x90/0x198
[ 8097.220236] __perf_event_overflow+0x20c/0x2e8
[ 8097.220739] perf_swevent_hrtimer+0xc4/0x140
...
bpftool / runtime evidence collected on the node
I ran the following commands and captured these outputs:
Show BPF programs:
$ sudo bpftool prog show | egrep -i 'native|tracer|perf|stack'
141: perf_event name do_perf_event tag 611f1d00ba06548a gpl
6773: perf_event name unwind_stop tag d6a19d535d7887ec gpl
...
6782: perf_event name native_tracer_e tag a6db16dc005b1a9f gpl
Stack maps:
$ sudo bpftool map show | grep stack
39: stack_trace name stacks flags 0x0
1987: stack_trace name kernel_stackmap flags 0x0
perf attachments:
$ sudo bpftool perf show
pid 10262 fd 44: prog_id 140 kprobe func disassociate_ctty offset 0
...
pid 10262 fd 266: prog_id 188 uprobe ...
... (many tracepoints/kprobes/uprobes)
pid 1070534 fd 73: prog_id 6781 tracepoint sched_process_exit
Note: native_tracer_entry/native_tracer_e programs were present while Parca was running.