Skip to content

Commit 7e55b95

Browse files
sesseacmel
authored andcommitted
perf intel-pt: Synthesize cycle events
There is no good reason why we cannot synthesize "cycle" events from Intel PT just as we can synthesize "instruction" events, in particular when CYC packets are available. This enables using PT to getting much more accurate cycle profiles than regular sampling (record -e cycles) when the work last for very short periods (<10 ms). Thus, add support for this, based off of the existing IPC calculation framework. The new option to --itrace is "y" (for cYcles), as c was taken for calls. Cycle and instruction events can be synthesized together, and are by default. The only real caveat is that CYC packets are only emitted whenever some other packet is, which in practice is when a branch instruction is encountered (and not even all branches). Thus, even at no subsampling (e.g. --itrace=y0ns), it is impossible to get more accuracy than a single basic block, and all cycles spent executing that block will get attributed to the branch instruction that ends the packet. Thus, one cannot know whether the cycles came from e.g. a specific load, a mispredicted branch, or something else. When subsampling (which is the default), the cycle events will get smeared out even more, but will still be generally useful to attribute cycle counts to functions. Reviewed-by: Adrian Hunter <[email protected]> Signed-off-by: Steinar H. Gunderson <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
1 parent 1470a10 commit 7e55b95

File tree

5 files changed

+101
-21
lines changed

5 files changed

+101
-21
lines changed

tools/perf/Documentation/itrace.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
i synthesize instructions events
2+
y synthesize cycles events
23
b synthesize branches events (branch misses for Arm SPE)
34
c synthesize branches events (calls only)
45
r synthesize branches events (returns only)
@@ -25,7 +26,7 @@
2526
A approximate IPC
2627
Z prefer to ignore timestamps (so-called "timeless" decoding)
2728

28-
The default is all events i.e. the same as --itrace=ibxwpe,
29+
The default is all events i.e. the same as --itrace=iybxwpe,
2930
except for perf script where it is --itrace=ce
3031

3132
In addition, the period (default 100000, except for perf script where it is 1)

tools/perf/Documentation/perf-intel-pt.txt

Lines changed: 24 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -101,12 +101,12 @@ data is available you can use the 'perf script' tool with all itrace sampling
101101
options, which will list all the samples.
102102

103103
perf record -e intel_pt//u ls
104-
perf script --itrace=ibxwpe
104+
perf script --itrace=iybxwpe
105105

106106
An interesting field that is not printed by default is 'flags' which can be
107107
displayed as follows:
108108

109-
perf script --itrace=ibxwpe -F+flags
109+
perf script --itrace=iybxwpe -F+flags
110110

111111
The flags are "bcrosyiABExghDt" which stand for branch, call, return, conditional,
112112
system, asynchronous, interrupt, transaction abort, trace begin, trace end,
@@ -147,16 +147,17 @@ displayed as follows:
147147
There are two ways that instructions-per-cycle (IPC) can be calculated depending
148148
on the recording.
149149

150-
If the 'cyc' config term (see config terms section below) was used, then IPC is
151-
calculated using the cycle count from CYC packets, otherwise MTC packets are
152-
used - refer to the 'mtc' config term. When MTC is used, however, the values
153-
are less accurate because the timing is less accurate.
150+
If the 'cyc' config term (see config terms section below) was used, then IPC
151+
and cycle events are calculated using the cycle count from CYC packets, otherwise
152+
MTC packets are used - refer to the 'mtc' config term. When MTC is used, however,
153+
the values are less accurate because the timing is less accurate.
154154

155155
Because Intel PT does not update the cycle count on every branch or instruction,
156156
the values will often be zero. When there are values, they will be the number
157157
of instructions and number of cycles since the last update, and thus represent
158-
the average IPC since the last IPC for that event type. Note IPC for "branches"
159-
events is calculated separately from IPC for "instructions" events.
158+
the average IPC cycle count since the last IPC for that event type.
159+
Note IPC for "branches" events is calculated separately from IPC for "instructions"
160+
events.
160161

161162
Even with the 'cyc' config term, it is possible to produce IPC information for
162163
every change of timestamp, but at the expense of accuracy. That is selected by
@@ -900,11 +901,12 @@ Having no option is the same as
900901

901902
which, in turn, is the same as
902903

903-
--itrace=cepwx
904+
--itrace=cepwxy
904905

905906
The letters are:
906907

907908
i synthesize "instructions" events
909+
y synthesize "cycles" events
908910
b synthesize "branches" events
909911
x synthesize "transactions" events
910912
w synthesize "ptwrite" events
@@ -927,16 +929,26 @@ The letters are:
927929
"Instructions" events look like they were recorded by "perf record -e
928930
instructions".
929931

932+
"Cycles" events look like they were recorded by "perf record -e cycles"
933+
(ie., the default). Note that even with CYC packets enabled and no sampling,
934+
these are not fully accurate, since CYC packets are not emitted for each
935+
instruction, only when some other event (like an indirect branch, or a
936+
TNT packet representing multiple branches) happens causes a packet to
937+
be emitted. Thus, it is more effective for attributing cycles to functions
938+
(and possibly basic blocks) than to individual instructions, although it
939+
is not even perfect for functions (although it becomes better if the noretcomp
940+
option is active).
941+
930942
"Branches" events look like they were recorded by "perf record -e branches". "c"
931943
and "r" can be combined to get calls and returns.
932944

933945
"Transactions" events correspond to the start or end of transactions. The
934946
'flags' field can be used in perf script to determine whether the event is a
935947
transaction start, commit or abort.
936948

937-
Note that "instructions", "branches" and "transactions" events depend on code
938-
flow packets which can be disabled by using the config term "branch=0". Refer
939-
to the config terms section above.
949+
Note that "instructions", "cycles", "branches" and "transactions" events
950+
depend on code flow packets which can be disabled by using the config term
951+
"branch=0". Refer to the config terms section above.
940952

941953
"ptwrite" events record the payload of the ptwrite instruction and whether
942954
"fup_on_ptw" was used. "ptwrite" events depend on PTWRITE packets which are

tools/perf/util/auxtrace.c

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1394,6 +1394,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
13941394
synth_opts->calls = true;
13951395
} else {
13961396
synth_opts->instructions = true;
1397+
synth_opts->cycles = true;
13971398
synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
13981399
synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
13991400
}
@@ -1482,7 +1483,11 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
14821483
for (p = str; *p;) {
14831484
switch (*p++) {
14841485
case 'i':
1485-
synth_opts->instructions = true;
1486+
case 'y':
1487+
if (p[-1] == 'y')
1488+
synth_opts->cycles = true;
1489+
else
1490+
synth_opts->instructions = true;
14861491
while (*p == ' ' || *p == ',')
14871492
p += 1;
14881493
if (isdigit(*p)) {
@@ -1641,7 +1646,7 @@ int itrace_do_parse_synth_opts(struct itrace_synth_opts *synth_opts,
16411646
}
16421647
}
16431648
out:
1644-
if (synth_opts->instructions) {
1649+
if (synth_opts->instructions || synth_opts->cycles) {
16451650
if (!period_type_set)
16461651
synth_opts->period_type =
16471652
PERF_ITRACE_DEFAULT_PERIOD_TYPE;

tools/perf/util/auxtrace.h

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,9 @@ enum itrace_period_type {
7171
* @inject: indicates the event (not just the sample) must be fully synthesized
7272
* because 'perf inject' will write it out
7373
* @instructions: whether to synthesize 'instructions' events
74+
* @cycles: whether to synthesize 'cycles' events
75+
* (not fully accurate, since CYC packets are only emitted
76+
* together with other events, such as branches)
7477
* @branches: whether to synthesize 'branches' events
7578
* (branch misses only for Arm SPE)
7679
* @transactions: whether to synthesize events for transactions
@@ -119,6 +122,7 @@ struct itrace_synth_opts {
119122
bool default_no_sample;
120123
bool inject;
121124
bool instructions;
125+
bool cycles;
122126
bool branches;
123127
bool transactions;
124128
bool ptwrites;
@@ -643,6 +647,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
643647

644648
#define ITRACE_HELP \
645649
" i[period]: synthesize instructions events\n" \
650+
" y[period]: synthesize cycles events (same period as i)\n" \
646651
" b: synthesize branches events (branch misses for Arm SPE)\n" \
647652
" c: synthesize branches events (calls only)\n" \
648653
" r: synthesize branches events (returns only)\n" \
@@ -674,7 +679,7 @@ bool auxtrace__evsel_is_auxtrace(struct perf_session *session,
674679
" A: approximate IPC\n" \
675680
" Z: prefer to ignore timestamps (so-called \"timeless\" decoding)\n" \
676681
" PERIOD[ns|us|ms|i|t]: specify period to sample stream\n" \
677-
" concatenate multiple options. Default is ibxwpe or cewp\n"
682+
" concatenate multiple options. Default is iybxwpe or cewp\n"
678683

679684
static inline
680685
void itrace_synth_opts__set_time_range(struct itrace_synth_opts *opts,

tools/perf/util/intel-pt.c

Lines changed: 62 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
*/
66

77
#include <inttypes.h>
8+
#include <linux/perf_event.h>
89
#include <stdio.h>
910
#include <stdbool.h>
1011
#include <errno.h>
@@ -98,6 +99,10 @@ struct intel_pt {
9899
u64 instructions_sample_type;
99100
u64 instructions_id;
100101

102+
bool sample_cycles;
103+
u64 cycles_sample_type;
104+
u64 cycles_id;
105+
101106
bool sample_branches;
102107
u32 branches_filter;
103108
u64 branches_sample_type;
@@ -214,6 +219,8 @@ struct intel_pt_queue {
214219
u64 ipc_cyc_cnt;
215220
u64 last_in_insn_cnt;
216221
u64 last_in_cyc_cnt;
222+
u64 last_cy_insn_cnt;
223+
u64 last_cy_cyc_cnt;
217224
u64 last_br_insn_cnt;
218225
u64 last_br_cyc_cnt;
219226
unsigned int cbr_seen;
@@ -1319,7 +1326,7 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
13191326
if (pt->filts.cnt > 0)
13201327
params.pgd_ip = intel_pt_pgd_ip;
13211328

1322-
if (pt->synth_opts.instructions) {
1329+
if (pt->synth_opts.instructions || pt->synth_opts.cycles) {
13231330
if (pt->synth_opts.period) {
13241331
switch (pt->synth_opts.period_type) {
13251332
case PERF_ITRACE_PERIOD_INSTRUCTIONS:
@@ -1830,6 +1837,33 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
18301837
pt->instructions_sample_type);
18311838
}
18321839

1840+
static int intel_pt_synth_cycle_sample(struct intel_pt_queue *ptq)
1841+
{
1842+
struct intel_pt *pt = ptq->pt;
1843+
union perf_event *event = ptq->event_buf;
1844+
struct perf_sample sample = { .ip = 0, };
1845+
u64 period = 0;
1846+
1847+
if (ptq->sample_ipc)
1848+
period = ptq->ipc_cyc_cnt - ptq->last_cy_cyc_cnt;
1849+
1850+
if (!period || intel_pt_skip_event(pt))
1851+
return 0;
1852+
1853+
intel_pt_prep_sample(pt, ptq, event, &sample);
1854+
1855+
sample.id = ptq->pt->cycles_id;
1856+
sample.stream_id = ptq->pt->cycles_id;
1857+
sample.period = period;
1858+
1859+
sample.cyc_cnt = period;
1860+
sample.insn_cnt = ptq->ipc_insn_cnt - ptq->last_cy_insn_cnt;
1861+
ptq->last_cy_insn_cnt = ptq->ipc_insn_cnt;
1862+
ptq->last_cy_cyc_cnt = ptq->ipc_cyc_cnt;
1863+
1864+
return intel_pt_deliver_synth_event(pt, event, &sample, pt->cycles_sample_type);
1865+
}
1866+
18331867
static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
18341868
{
18351869
struct intel_pt *pt = ptq->pt;
@@ -2598,10 +2632,17 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
25982632
}
25992633
}
26002634

2601-
if (pt->sample_instructions && (state->type & INTEL_PT_INSTRUCTION)) {
2602-
err = intel_pt_synth_instruction_sample(ptq);
2603-
if (err)
2604-
return err;
2635+
if (state->type & INTEL_PT_INSTRUCTION) {
2636+
if (pt->sample_instructions) {
2637+
err = intel_pt_synth_instruction_sample(ptq);
2638+
if (err)
2639+
return err;
2640+
}
2641+
if (pt->sample_cycles) {
2642+
err = intel_pt_synth_cycle_sample(ptq);
2643+
if (err)
2644+
return err;
2645+
}
26052646
}
26062647

26072648
if (pt->sample_transactions && (state->type & INTEL_PT_TRANSACTION)) {
@@ -3731,6 +3772,22 @@ static int intel_pt_synth_events(struct intel_pt *pt,
37313772
id += 1;
37323773
}
37333774

3775+
if (pt->synth_opts.cycles) {
3776+
attr.config = PERF_COUNT_HW_CPU_CYCLES;
3777+
if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
3778+
attr.sample_period =
3779+
intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
3780+
else
3781+
attr.sample_period = pt->synth_opts.period;
3782+
err = intel_pt_synth_event(session, "cycles", &attr, id);
3783+
if (err)
3784+
return err;
3785+
pt->sample_cycles = true;
3786+
pt->cycles_sample_type = attr.sample_type;
3787+
pt->cycles_id = id;
3788+
id += 1;
3789+
}
3790+
37343791
attr.sample_type &= ~(u64)PERF_SAMPLE_PERIOD;
37353792
attr.sample_period = 1;
37363793

0 commit comments

Comments
 (0)