Skip to content

Parca Agent is causing Kernel Panic at aws al2023 ami while using perf_handler native_tracer #3112

@minseok-prestolabs

Description

@minseok-prestolabs

Hi

Context

We are currently using aws eks ami 1.33 version(al2023_arm64) and we are running parca agent on the machine.

The problem

On Amazon Linux 2023 (aarch64, Graviton) a kernel Oops occurs while Parca is running. The kernel call trace shows perf → bpf overflow → bpf_prog_native_tracer_e → bpf_get_stackid_pe → arch_stack_walk and a level-3 translation fault (unmapped PTE). bpftool shows a live perf_event BPF program named native_tracer and stack maps. Conclusion: Parca’s native/perf eBPF tracer triggers kernel stack unwinding (arch_stack_walk) in an IRQ/softirq context and that unwinding leads to dereferencing an unmapped kernel address → Oops.

Here is the logs

Crash log (relevant excerpt)

[ 8097.188177] Unable to handle kernel paging request at virtual address ffff8000acecc0b0
[ 8097.189406]   ESR = 0x0000000096000007
[ 8097.191161]   FSC = 0x07: level 3 translation fault
[ 8097.194532] [ffff8000acecc0b0] pgd=10000000433e3003,..., pte=0000000000000000
[ 8097.195878] Internal error: Oops: 0000000096000007 [#1] SMP
...
[ 8097.216574]  arch_stack_walk+0x218/0x5a0 (P)
[ 8097.217060]  perf_callchain_kernel+0x48/0x60 (P)
[ 8097.217581]  get_perf_callchain+0xa0/0x260 (P)
[ 8097.218085]  bpf_get_stackid+0x7c/0xc8 (P)
[ 8097.218543]  bpf_get_stackid_pe+0xec/0x128 (P)
[ 8097.219054]  bpf_prog_a6db16dc005b1a9f_native_tracer_e+0x1dc/0xc00 (P)
[ 8097.219754]  bpf_overflow_handler+0x90/0x198
[ 8097.220236]  __perf_event_overflow+0x20c/0x2e8
[ 8097.220739]  perf_swevent_hrtimer+0xc4/0x140
...

bpftool / runtime evidence collected on the node

I ran the following commands and captured these outputs:

Show BPF programs:

$ sudo bpftool prog show | egrep -i 'native|tracer|perf|stack'
141: perf_event  name do_perf_event  tag 611f1d00ba06548a  gpl
6773: perf_event  name unwind_stop  tag d6a19d535d7887ec  gpl
...
6782: perf_event  name native_tracer_e  tag a6db16dc005b1a9f  gpl

Stack maps:

$ sudo bpftool map show | grep stack
39: stack_trace  name stacks  flags 0x0
1987: stack_trace  name kernel_stackmap  flags 0x0

perf attachments:

$ sudo bpftool perf show
pid 10262  fd 44: prog_id 140  kprobe  func disassociate_ctty  offset 0
...
pid 10262  fd 266: prog_id 188  uprobe  ...
... (many tracepoints/kprobes/uprobes)
pid 1070534  fd 73: prog_id 6781  tracepoint  sched_process_exit

Note: native_tracer_entry/native_tracer_e programs were present while Parca was running.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions