Skip to content

Commit 10a3efd

Browse files
committed
Merge tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo: "perf stat: - Add support for hybrid PMUs to support systems such as Intel Alderlake and its BIG/little core/atom cpus. - Introduce 'bperf' to share hardware PMCs with BPF. - New --iostat option to collect and present IO stats on Intel hardware. This functionality is based on recently introduced sysfs attributes for Intel® Xeon® Scalable processor family (code name Skylake-SP) in commit bb42b3d ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") It is intended to provide four I/O performance metrics in MB per each PCIe root port: - Inbound Read: I/O devices below root port read from the host memory - Inbound Write: I/O devices below root port write to the host memory - Outbound Read: CPU reads from I/O devices below root port - Outbound Write: CPU writes to I/O devices below root port - Align CSV output for summary. - Clarify --null use cases: Assess raw overhead of 'perf stat' or measure just wall clock time. - Improve readability of shadow stats. perf record: - Change the COMM when starting tha workload so that --exclude-perf doesn't seem to be not honoured. - Improve 'Workload failed' message printing events + what was exec'ed. - Fix cross-arch support for TIME_CONV. perf report: - Add option to disable raw event ordering. - Dump the contents of PERF_RECORD_TIME_CONV in 'perf report -D'. - Improvements to --stat output, that shows information about PERF_RECORD_ events. - Preserve identifier id in OCaml demangler. perf annotate: - Show full source location with 'l' hotkey in the 'perf annotate' TUI. - Add line number like in TUI and source location at EOL to the 'perf annotate' --stdio mode. - Add --demangle and --demangle-kernel to 'perf annotate'. - Allow configuring annotate.demangle{,_kernel} in 'perf config'. - Fix sample events lost in stdio mode. perf data: - Allow converting a perf.data file to JSON. libperf: - Add support for user space counter access. - Update topdown documentation to permit rdpmc calls. perf test: - Add 'perf test' for 'perf stat' CSV output. - Add 'perf test' entries to test the hybrid PMU support. - Cleanup 'perf test daemon' if its 'perf test' is interrupted. - Handle metric reuse in pmu-events parsing 'perf test' entry. - Add test for PE executable support. - Add timeout for wait for daemon start in its 'perf test' entries. Build: - Enable libtraceevent dynamic linking. - Improve feature detection output. - Fix caching of feature checks caching. - First round of updates for tools copies of kernel headers. - Enable warnings when compiling BPF programs. Vendor specific events: - Intel: - Add missing skylake & icelake model numbers. - arm64: - Add Hisi hip08 L1, L2 and L3 metrics. - Add Fujitsu A64FX PMU events. - PowerPC: - Initial JSON/events list for power10 platform. - Remove unsupported power9 metrics. - AMD: - Add Zen3 events. - Fix broken L2 Cache Hits from L2 HWPF metric. - Use lowercases for all the eventcodes and umasks. Hardware tracing: - arm64: - Update CoreSight ETM metadata format. - Fix bitmap for CS-ETM option. - Support PID tracing in config. - Detect pid in VMID for kernel running at EL2. Arch specific updates: - MIPS: - Support MIPS unwinding and dwarf-regs. - Generate mips syscalls_n64.c syscall table. - PowerPC: - Add support for PERF_SAMPLE_WEIGH_STRUCT on PowerPC. - Support pipeline stage cycles for powerpc. libbeauty: - Fix fsconfig generator" * tag 'perf-tools-for-v5.13-2021-04-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (132 commits) perf build: Defer printing detected features to the end of all feature checks tools build: Allow deferring printing the results of feature detection perf build: Regenerate the FEATURE_DUMP file after extra feature checks perf session: Dump PERF_RECORD_TIME_CONV event perf session: Add swap operation for event TIME_CONV perf jit: Let convert_timestamp() to be backwards-compatible perf tools: Change fields type in perf_record_time_conv perf tools: Enable libtraceevent dynamic linking perf Documentation: Document intel-hybrid support perf tests: Skip 'perf stat metrics (shadow stat) test' for hybrid perf tests: Support 'Convert perf time to TSC' test for hybrid perf tests: Support 'Session topology' test for hybrid perf tests: Support 'Parse and process metrics' test for hybrid perf tests: Support 'Track with sched_switch' test for hybrid perf tests: Skip 'Setup struct perf_event_attr' test for hybrid perf tests: Add hybrid cases for 'Roundtrip evsel->name' test perf tests: Add hybrid cases for 'Parse event definition strings' test perf record: Uniquify hybrid event name perf stat: Warn group events from different hybrid PMU perf stat: Filter out unmatched aggregation for hybrid event ...
2 parents 22650f1 + c6e3bf4 commit 10a3efd

File tree

244 files changed

+9952
-883
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

244 files changed

+9952
-883
lines changed

MAINTAINERS

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14290,8 +14290,10 @@ R: Mark Rutland <[email protected]>
1429014290
R: Alexander Shishkin <[email protected]>
1429114291
R: Jiri Olsa <[email protected]>
1429214292
R: Namhyung Kim <[email protected]>
14293+
1429314294
1429414295
S: Supported
14296+
W: https://perf.wiki.kernel.org/
1429514297
T: git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core
1429614298
F: arch/*/events/*
1429714299
F: arch/*/events/*/*

tools/build/Makefile.feature

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ FEATURE_TESTS_BASIC := \
5252
libpython-version \
5353
libslang \
5454
libslang-include-subdir \
55+
libtraceevent \
5556
libcrypto \
5657
libunwind \
5758
pthread-attr-setaffinity-np \
@@ -239,17 +240,24 @@ ifeq ($(VF),1)
239240
feature_verbose := 1
240241
endif
241242

242-
ifeq ($(feature_display),1)
243-
$(info )
244-
$(info Auto-detecting system features:)
245-
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_print_status,$(feat),))
246-
ifneq ($(feature_verbose),1)
243+
feature_display_entries = $(eval $(feature_display_entries_code))
244+
define feature_display_entries_code
245+
ifeq ($(feature_display),1)
247246
$(info )
247+
$(info Auto-detecting system features:)
248+
$(foreach feat,$(FEATURE_DISPLAY),$(call feature_print_status,$(feat),))
249+
ifneq ($(feature_verbose),1)
250+
$(info )
251+
endif
248252
endif
249-
endif
250253

251-
ifeq ($(feature_verbose),1)
252-
TMP := $(filter-out $(FEATURE_DISPLAY),$(FEATURE_TESTS))
253-
$(foreach feat,$(TMP),$(call feature_print_status,$(feat),))
254-
$(info )
254+
ifeq ($(feature_verbose),1)
255+
TMP := $(filter-out $(FEATURE_DISPLAY),$(FEATURE_TESTS))
256+
$(foreach feat,$(TMP),$(call feature_print_status,$(feat),))
257+
$(info )
258+
endif
259+
endef
260+
261+
ifeq ($(FEATURE_DISPLAY_DEFERRED),)
262+
$(call feature_display_entries)
255263
endif

tools/build/feature/Makefile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@ FILES= \
3636
test-libpython-version.bin \
3737
test-libslang.bin \
3838
test-libslang-include-subdir.bin \
39+
test-libtraceevent.bin \
3940
test-libcrypto.bin \
4041
test-libunwind.bin \
4142
test-libunwind-debug-frame.bin \
@@ -196,6 +197,9 @@ $(OUTPUT)test-libslang.bin:
196197
$(OUTPUT)test-libslang-include-subdir.bin:
197198
$(BUILD) -lslang
198199

200+
$(OUTPUT)test-libtraceevent.bin:
201+
$(BUILD) -ltraceevent
202+
199203
$(OUTPUT)test-libcrypto.bin:
200204
$(BUILD) -lcrypto
201205

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
// SPDX-License-Identifier: GPL-2.0
2+
#include <traceevent/trace-seq.h>
3+
4+
int main(void)
5+
{
6+
int rv = 0;
7+
struct trace_seq s;
8+
trace_seq_init(&s);
9+
rv += !(s.state == TRACE_SEQ__GOOD);
10+
trace_seq_destroy(&s);
11+
return rv;
12+
}

tools/include/linux/math64.h

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
/* SPDX-License-Identifier: GPL-2.0 */
2+
#ifndef _LINUX_MATH64_H
3+
#define _LINUX_MATH64_H
4+
5+
#include <linux/types.h>
6+
7+
#ifdef __x86_64__
8+
static inline u64 mul_u64_u64_div64(u64 a, u64 b, u64 c)
9+
{
10+
u64 q;
11+
12+
asm ("mulq %2; divq %3" : "=a" (q)
13+
: "a" (a), "rm" (b), "rm" (c)
14+
: "rdx");
15+
16+
return q;
17+
}
18+
#define mul_u64_u64_div64 mul_u64_u64_div64
19+
#endif
20+
21+
#ifdef __SIZEOF_INT128__
22+
static inline u64 mul_u64_u32_shr(u64 a, u32 b, unsigned int shift)
23+
{
24+
return (u64)(((unsigned __int128)a * b) >> shift);
25+
}
26+
27+
#else
28+
29+
#ifdef __i386__
30+
static inline u64 mul_u32_u32(u32 a, u32 b)
31+
{
32+
u32 high, low;
33+
34+
asm ("mull %[b]" : "=a" (low), "=d" (high)
35+
: [a] "a" (a), [b] "rm" (b) );
36+
37+
return low | ((u64)high) << 32;
38+
}
39+
#else
40+
static inline u64 mul_u32_u32(u32 a, u32 b)
41+
{
42+
return (u64)a * b;
43+
}
44+
#endif
45+
46+
static inline u64 mul_u64_u32_shr(u64 a, u32 b, unsigned int shift)
47+
{
48+
u32 ah, al;
49+
u64 ret;
50+
51+
al = a;
52+
ah = a >> 32;
53+
54+
ret = mul_u32_u32(al, b) >> shift;
55+
if (ah)
56+
ret += mul_u32_u32(ah, b) << (32 - shift);
57+
58+
return ret;
59+
}
60+
61+
#endif /* __SIZEOF_INT128__ */
62+
63+
#ifndef mul_u64_u64_div64
64+
static inline u64 mul_u64_u64_div64(u64 a, u64 b, u64 c)
65+
{
66+
u64 quot, rem;
67+
68+
quot = a / c;
69+
rem = a % c;
70+
71+
return quot * b + (rem * b) / c;
72+
}
73+
#endif
74+
75+
#endif /* _LINUX_MATH64_H */

tools/include/linux/types.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,9 @@ typedef __u32 __bitwise __be32;
6161
typedef __u64 __bitwise __le64;
6262
typedef __u64 __bitwise __be64;
6363

64+
typedef __u16 __bitwise __sum16;
65+
typedef __u32 __bitwise __wsum;
66+
6467
typedef struct {
6568
int counter;
6669
} atomic_t;

tools/include/uapi/linux/perf_event.h

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,21 @@ enum perf_type_id {
3737
PERF_TYPE_MAX, /* non-ABI */
3838
};
3939

40+
/*
41+
* attr.config layout for type PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE
42+
* PERF_TYPE_HARDWARE: 0xEEEEEEEE000000AA
43+
* AA: hardware event ID
44+
* EEEEEEEE: PMU type ID
45+
* PERF_TYPE_HW_CACHE: 0xEEEEEEEE00DDCCBB
46+
* BB: hardware cache ID
47+
* CC: hardware cache op ID
48+
* DD: hardware cache op result ID
49+
* EEEEEEEE: PMU type ID
50+
* If the PMU type ID is 0, the PERF_TYPE_RAW will be applied.
51+
*/
52+
#define PERF_PMU_TYPE_SHIFT 32
53+
#define PERF_HW_EVENT_MASK 0xffffffff
54+
4055
/*
4156
* Generalized performance event event_id types, used by the
4257
* attr.event_id parameter of the sys_perf_event_open()

tools/lib/perf/Documentation/libperf.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,9 @@ SYNOPSIS
136136
struct perf_thread_map *threads);
137137
void perf_evsel__close(struct perf_evsel *evsel);
138138
void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu);
139+
int perf_evsel__mmap(struct perf_evsel *evsel, int pages);
140+
void perf_evsel__munmap(struct perf_evsel *evsel);
141+
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread);
139142
int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
140143
struct perf_counts_values *count);
141144
int perf_evsel__enable(struct perf_evsel *evsel);

tools/lib/perf/evsel.c

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@
1111
#include <stdlib.h>
1212
#include <internal/xyarray.h>
1313
#include <internal/cpumap.h>
14+
#include <internal/mmap.h>
1415
#include <internal/threadmap.h>
1516
#include <internal/lib.h>
1617
#include <linux/string.h>
1718
#include <sys/ioctl.h>
19+
#include <sys/mman.h>
1820

1921
void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr)
2022
{
@@ -38,6 +40,7 @@ void perf_evsel__delete(struct perf_evsel *evsel)
3840
}
3941

4042
#define FD(e, x, y) (*(int *) xyarray__entry(e->fd, x, y))
43+
#define MMAP(e, x, y) (e->mmap ? ((struct perf_mmap *) xyarray__entry(e->mmap, x, y)) : NULL)
4144

4245
int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
4346
{
@@ -55,6 +58,13 @@ int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
5558
return evsel->fd != NULL ? 0 : -ENOMEM;
5659
}
5760

61+
static int perf_evsel__alloc_mmap(struct perf_evsel *evsel, int ncpus, int nthreads)
62+
{
63+
evsel->mmap = xyarray__new(ncpus, nthreads, sizeof(struct perf_mmap));
64+
65+
return evsel->mmap != NULL ? 0 : -ENOMEM;
66+
}
67+
5868
static int
5969
sys_perf_event_open(struct perf_event_attr *attr,
6070
pid_t pid, int cpu, int group_fd,
@@ -156,6 +166,72 @@ void perf_evsel__close_cpu(struct perf_evsel *evsel, int cpu)
156166
perf_evsel__close_fd_cpu(evsel, cpu);
157167
}
158168

169+
void perf_evsel__munmap(struct perf_evsel *evsel)
170+
{
171+
int cpu, thread;
172+
173+
if (evsel->fd == NULL || evsel->mmap == NULL)
174+
return;
175+
176+
for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
177+
for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
178+
int fd = FD(evsel, cpu, thread);
179+
struct perf_mmap *map = MMAP(evsel, cpu, thread);
180+
181+
if (fd < 0)
182+
continue;
183+
184+
perf_mmap__munmap(map);
185+
}
186+
}
187+
188+
xyarray__delete(evsel->mmap);
189+
evsel->mmap = NULL;
190+
}
191+
192+
int perf_evsel__mmap(struct perf_evsel *evsel, int pages)
193+
{
194+
int ret, cpu, thread;
195+
struct perf_mmap_param mp = {
196+
.prot = PROT_READ | PROT_WRITE,
197+
.mask = (pages * page_size) - 1,
198+
};
199+
200+
if (evsel->fd == NULL || evsel->mmap)
201+
return -EINVAL;
202+
203+
if (perf_evsel__alloc_mmap(evsel, xyarray__max_x(evsel->fd), xyarray__max_y(evsel->fd)) < 0)
204+
return -ENOMEM;
205+
206+
for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) {
207+
for (thread = 0; thread < xyarray__max_y(evsel->fd); thread++) {
208+
int fd = FD(evsel, cpu, thread);
209+
struct perf_mmap *map = MMAP(evsel, cpu, thread);
210+
211+
if (fd < 0)
212+
continue;
213+
214+
perf_mmap__init(map, NULL, false, NULL);
215+
216+
ret = perf_mmap__mmap(map, &mp, fd, cpu);
217+
if (ret) {
218+
perf_evsel__munmap(evsel);
219+
return ret;
220+
}
221+
}
222+
}
223+
224+
return 0;
225+
}
226+
227+
void *perf_evsel__mmap_base(struct perf_evsel *evsel, int cpu, int thread)
228+
{
229+
if (FD(evsel, cpu, thread) < 0 || MMAP(evsel, cpu, thread) == NULL)
230+
return NULL;
231+
232+
return MMAP(evsel, cpu, thread)->base;
233+
}
234+
159235
int perf_evsel__read_size(struct perf_evsel *evsel)
160236
{
161237
u64 read_format = evsel->attr.read_format;
@@ -191,6 +267,10 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
191267
if (FD(evsel, cpu, thread) < 0)
192268
return -EINVAL;
193269

270+
if (MMAP(evsel, cpu, thread) &&
271+
!perf_mmap__read_self(MMAP(evsel, cpu, thread), count))
272+
return 0;
273+
194274
if (readn(FD(evsel, cpu, thread), count->values, size) <= 0)
195275
return -errno;
196276

tools/lib/perf/include/internal/evsel.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ struct perf_evsel {
4141
struct perf_cpu_map *own_cpus;
4242
struct perf_thread_map *threads;
4343
struct xyarray *fd;
44+
struct xyarray *mmap;
4445
struct xyarray *sample_id;
4546
u64 *id;
4647
u32 ids;

0 commit comments

Comments
 (0)