forked from torvalds/linux
-
Notifications
You must be signed in to change notification settings - Fork 475
Commit fe4f9b4
perf trace: Fix tracing itself, creating feedback loops
There exists a pids_filtered map in augmented_raw_syscalls.bpf.c that
ceases to provide functionality after the BPF skeleton migration done
in:
5e6da6b ("perf trace: Migrate BPF augmentation to use a skeleton")
Before the migration, pid_filtered map works, courtesy of Arnaldo
Carvalho de Melo <[email protected]>:
⬢ [acme@toolbox perf-tools]$ git log --oneline -5
6f769c3 (HEAD) perf tests trace+probe_vfs_getname.sh: Accept quotes surrounding the filename
7777ac3 perf test trace+probe_vfs_getname.sh: Remove stray \ before /
33d9c50 perf script python: Add stub for PMU symbol to the python binding
e59fea4 perf symbols: Fix DSO kernel load and symbol process to correctly map DSO to its long_name, type and adjust_symbols
878460e perf build: Remove -Wno-unused-but-set-variable from the flex flags when building with clang < 13.0.0
root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30 &
[1] 180632
root@x1:/home/acme/git/perf-tools# 0.000 ( 0.051 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8
0.115 ( 0.010 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8
0.916 ( 0.068 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 246) = 246
1.699 ( 0.047 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
2.167 ( 0.041 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
2.739 ( 0.042 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
3.138 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
3.477 ( 0.027 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
3.738 ( 0.023 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
3.946 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
4.195 ( 0.024 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 121) = 121
4.212 ( 0.026 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8
4.285 ( 0.006 ms): NetworkManager/1127 write(fd: 3, buf: 0x7ffeb508ef70, count: 8) = 8
4.445 ( 0.018 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 260) = 260
4.508 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 124) = 124
4.592 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 116) = 116
4.666 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 130) = 130
4.715 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 95) = 95
4.765 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 102) = 102
4.815 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79
4.890 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 57) = 57
4.937 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89
5.009 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112
5.059 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 112) = 112
5.116 ( 0.007 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 79) = 79
5.152 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 33) = 33
5.215 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 37) = 37
5.293 ( 0.010 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 128) = 128
5.339 ( 0.009 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 89) = 89
5.384 ( 0.008 ms): sudo/156867 write(fd: 8, buf: 0x55cb4cd2f650, count: 100) = 100
[1]+ Done perf trace -e /tmp/augmented_raw_syscalls.o -e write* --max-events=30
root@x1:/home/acme/git/perf-tools#
No events for the 'perf trace' (pid 180632), i.e. no feedback loop.
If we leave it running:
root@x1:/home/acme/git/perf-tools# perf trace -e /tmp/augmented_raw_syscalls.o -e landlock_add_rule &
[1] 181068
root@x1:/home/acme/git/perf-tools#
And then look at what maps it sets up:
root@x1:/home/acme/git/perf-tools# bpftool map | grep pids_filtered -A3
1190: hash name pids_filtered flags 0x0
key 4B value 1B max_entries 64 memlock 7264B
btf_id 1613
pids perf(181068)
root@x1:/home/acme/git/perf-tools#
And ask for dumping its contents:
We see that we are _also_ setting it to filter those:
root@x1:/home/acme/git/perf-tools# bpftool map dump id 1190
[{
"key": 181068,
"value": 1
},{
"key": 156801,
"value": 1
}
]
Now testing the migration commit:
perf $ git log
commit 5e6da6b (HEAD)
Author: Ian Rogers <[email protected]>
Date: Thu Aug 10 11:48:51 2023 -0700
perf trace: Migrate BPF augmentation to use a skeleton
perf $ ./perf trace -e write --max-events=10 & echo #!
[1] 1808653
perf $
0.000 ( 0.010 ms): :1808671/1808671 write(fd: 1, buf: 0x6003f5b26fc0, count: 11) = 11
0.162 ( ): perf/1808653 write(fd: 2, buf: 0x7fffc2174e50, count: 11) ...
0.174 ( ): perf/1808653 write(fd: 2, buf: 0x74ce21804563, count: 1) ...
0.184 ( ): perf/1808653 write(fd: 2, buf: 0x57b936589052, count: 5)
The feedback loop is there.
Keep it running, look into the bpf map:
perf $ bpftool map | grep pids_filtered
10675: hash name pids_filtered flags 0x0
perf $ bpftool map dump id 10675
[]
The map is empty.
Now, this commit:
64917f4 ("perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string")
Temporarily fixed the feedback loop for perf trace -e write, that's
because before using the heuristic, write is hooked to sys_enter_openat:
perf $ git log
commit 83a0943 (HEAD)
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu Aug 17 12:11:51 2023 -0300
perf trace: Use the augmented_raw_syscall BPF skel only for tracing syscalls
perf $ ./perf trace -e write --max-events=10 -v 2>&1 | grep Reusing
Reusing "openat" BPF sys_enter augmenter for "write"
And after the heuristic fix, it's unaugmented:
perf $ git log
commit 64917f4 (HEAD)
Author: Arnaldo Carvalho de Melo <[email protected]>
Date: Thu Aug 17 15:14:21 2023 -0300
perf trace: Use heuristic when deciding if a syscall tracepoint "const char *" field is really a string
perf $ ./perf trace -e write --max-events=10 -v 2>&1 | grep Reusing
perf $
After using the heuristic, write is hooked to syscall_unaugmented, which
returns 1.
SEC("tp/raw_syscalls/sys_enter")
int syscall_unaugmented(struct syscall_enter_args *args)
{
return 1;
}
If the BPF program returns 1, the tracepoint filter will filter it
(since the tracepoint filter for perf is correctly set), but before the
heuristic, when it was hooked to a sys_enter_openat(), which is a BPF
program that calls bpf_perf_event_output() and writes to the buffer, it
didn't get filtered, thus creating feedback loop. So switching write to
unaugmented accidentally fixed the problem.
But some syscalls are not so lucky, for example newfstatat:
perf $ ./perf trace -e newfstatat --max-events=100 & echo #!
[1] 2166948
457.718 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ...
457.749 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/2166950/ns/mnt", statbuf: 0x7fff0132aa80) ...
457.962 ( ): perf/2166948 newfstatat(dfd: CWD, filename: "/proc/self/ns/mnt", statbuf: 0x7fff0132a9f0) ...
Currently, write is augmented by the new BTF general augmenter (which
calls bpf_perf_event_output()). The problem, which luckily got fixed,
resurfaced, and that’s how it was discovered.
Fixes: 5e6da6b ("perf trace: Migrate BPF augmentation to use a skeleton")
Signed-off-by: Howard Chu <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Cc: Adrian Hunter <[email protected]>
Cc: Alexander Shishkin <[email protected]>
Cc: Ian Rogers <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: James Clark <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Kan Liang <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[ Check if trace->skel is non-NULL, as it is only initialized if trace->trace_syscalls is set ]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>1 parent 31bedc1 commit fe4f9b4Copy full SHA for fe4f9b4
File tree
Expand file treeCollapse file tree
1 file changed
+3
-0
lines changedFilter options
- tools/perf
Expand file treeCollapse file tree
1 file changed
+3
-0
lines changed+3Lines changed: 3 additions & 0 deletions
Original file line number | Diff line number | Diff line change | |
---|---|---|---|
| |||
4326 | 4326 |
| |
4327 | 4327 |
| |
4328 | 4328 |
| |
| 4329 | + | |
| 4330 | + | |
| 4331 | + | |
4329 | 4332 |
| |
4330 | 4333 |
| |
4331 | 4334 |
| |
|
0 commit comments