Skip to content

Commit 4f90ed0

Browse files
namhyungacmel
authored andcommitted
perf trace: Fix unaligned access for augmented args
Some version of compilers reported unaligned accesses in perf trace when undefined-behavior sanitizer is on. I found that it uses raw data in the sample directly and assuming it's properly aligned. Unlike other sample fields, the raw data is not 8-byte aligned because there's a size field (u32) before the actual data. So I added a static buffer in syscall__augmented_args() and return it instead. This is not ideal but should work well as perf trace is single-threaded. A better approach would be aligning the raw data by adding a 4-byte data before the augmented args but I'm afraid it'd break the backward compatibility. Committer testing: To build with the undefined behaviour sanitizer: $ make CC=clang EXTRA_CFLAGS=-fsanitize=undefined -C tools/perf Checking if the resulting binary is instrumented: root@number:~# nm ~/bin/perf | grep ubsan | wc -l 113 root@number:~# nm ~/bin/perf | grep ubsan | tail -5 000000000043d5b0 t _ZN7__ubsanL19UBsanOnDeadlySignalEiPvS0_ 000000000043ce50 T _ZNK7__ubsan5Value12getSIntValueEv 000000000043cf40 T _ZNK7__ubsan5Value12getUIntValueEv 000000000043d140 T _ZNK7__ubsan5Value13getFloatValueEv 000000000043cfd0 T _ZNK7__ubsan5Value19getPositiveIntValueEv root@number:~# Now running something that will access timespec, as reported in the Closes URL: root@number:~# perf trace --max-events=1 -e *nano* sleep 1.1 trace/beauty/timespec.c:10:64: runtime error: member access within misaligned address 0x7fc583cfb2a4 for type 'struct augmented_arg', which requires 8 byte alignment 0x7fc583cfb2a4: note: pointer points here 99 99 11 00 10 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 e1 f5 05 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior trace/beauty/timespec.c:10:64 <SNIP> As Namhyung said we need to make the raw_data to be 64-bit aligned, probably we need to add a PERF_SAMPLE_ALIGNED_RAW with a 64-bit raw_size instead of the current u32 done at kernel/events/core.c, perf_output_sample(), that perf_output_put(handle, raw->size) where raw->size is an u32 and then the raw_data is always 64-bit unaligned... After the patch: root@number:~# perf trace -e *nano* sleep 1.1 0.000 (1100.064 ms): sleep/1984224 clock_nanosleep(rqtp: { .tv_sec: 1, .tv_nsec: 100000001 }, rmtp: 0x7fff5b3fe970) = 0 root@number:~# Closes: https://lore.kernel.org/r/[email protected] Reviewed-by: Howard Chu <[email protected]> Signed-off-by: Namhyung Kim <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
1 parent 0ba2022 commit 4f90ed0

File tree

1 file changed

+17
-4
lines changed

1 file changed

+17
-4
lines changed

tools/perf/builtin-trace.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2559,7 +2559,6 @@ static int trace__fprintf_sample(struct trace *trace, struct evsel *evsel,
25592559

25602560
static void *syscall__augmented_args(struct syscall *sc, struct perf_sample *sample, int *augmented_args_size, int raw_augmented_args_size)
25612561
{
2562-
void *augmented_args = NULL;
25632562
/*
25642563
* For now with BPF raw_augmented we hook into raw_syscalls:sys_enter
25652564
* and there we get all 6 syscall args plus the tracepoint common fields
@@ -2577,10 +2576,24 @@ static void *syscall__augmented_args(struct syscall *sc, struct perf_sample *sam
25772576
int args_size = raw_augmented_args_size ?: sc->args_size;
25782577

25792578
*augmented_args_size = sample->raw_size - args_size;
2580-
if (*augmented_args_size > 0)
2581-
augmented_args = sample->raw_data + args_size;
2579+
if (*augmented_args_size > 0) {
2580+
static uintptr_t argbuf[1024]; /* assuming single-threaded */
2581+
2582+
if ((size_t)(*augmented_args_size) > sizeof(argbuf))
2583+
return NULL;
2584+
2585+
/*
2586+
* The perf ring-buffer is 8-byte aligned but sample->raw_data
2587+
* is not because it's preceded by u32 size. Later, beautifier
2588+
* will use the augmented args with stricter alignments like in
2589+
* some struct. To make sure it's aligned, let's copy the args
2590+
* into a static buffer as it's single-threaded for now.
2591+
*/
2592+
memcpy(argbuf, sample->raw_data + args_size, *augmented_args_size);
25822593

2583-
return augmented_args;
2594+
return argbuf;
2595+
}
2596+
return NULL;
25842597
}
25852598

25862599
static void syscall__exit(struct syscall *sc)

0 commit comments

Comments
 (0)