@@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
150
150
pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
151
151
store_filter=1 - collect stores only (PMSFCR.ST)
152
152
ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
153
+ discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
153
154
154
155
+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
155
156
than only the execution latency.
@@ -220,6 +221,31 @@ Common errors
220
221
221
222
Increase sampling interval (see above)
222
223
224
+ PMU events
225
+ ~~~~~~~~~~
226
+
227
+ SPE has events that can be counted on core PMUs. These are prefixed with
228
+ SAMPLE_, for example SAMPLE_POP, SAMPLE_FEED, SAMPLE_COLLISION and
229
+ SAMPLE_FEED_BR.
230
+
231
+ These events will only count when an SPE event is running on the same core that
232
+ the PMU event is opened on, otherwise they read as 0. There are various ways to
233
+ ensure that the PMU event and SPE event are scheduled together depending on the
234
+ way the event is opened. For example opening both events as per-process events
235
+ on the same process, although it's not guaranteed that the PMU event is enabled
236
+ first when context switching. For that reason it may be better to open the PMU
237
+ event as a systemwide event and then open SPE on the process of interest.
238
+
239
+ Discard mode
240
+ ~~~~~~~~~~~~
241
+
242
+ SPE related (SAMPLE_* etc) core PMU events can be used without the overhead of
243
+ collecting sample data if discard mode is supported (optional from Armv8.6).
244
+ First run a system wide SPE session (or on the core of interest) using options
245
+ to minimize output. Then run perf stat:
246
+
247
+ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
248
+ perf stat -e SAMPLE_FEED_LD
223
249
224
250
SEE ALSO
225
251
--------
0 commit comments