@@ -150,6 +150,7 @@ arm_spe/load_filter=1,min_latency=10/'
150150 pct_enable=1 - collect physical timestamp instead of virtual timestamp (PMSCR.PCT) - requires privilege
151151 store_filter=1 - collect stores only (PMSFCR.ST)
152152 ts_enable=1 - enable timestamping with value of generic timer (PMSCR.TS)
153+ discard=1 - enable SPE PMU events but don't collect sample data - see 'Discard mode' (PMBLIMITR.FM = DISCARD)
153154
154155+++*+++ Latency is the total latency from the point at which sampling started on that instruction, rather
155156than only the execution latency.
@@ -220,6 +221,31 @@ Common errors
220221
221222 Increase sampling interval (see above)
222223
224+ PMU events
225+ ~~~~~~~~~~
226+
227+ SPE has events that can be counted on core PMUs. These are prefixed with
228+ SAMPLE_, for example SAMPLE_POP, SAMPLE_FEED, SAMPLE_COLLISION and
229+ SAMPLE_FEED_BR.
230+
231+ These events will only count when an SPE event is running on the same core that
232+ the PMU event is opened on, otherwise they read as 0. There are various ways to
233+ ensure that the PMU event and SPE event are scheduled together depending on the
234+ way the event is opened. For example opening both events as per-process events
235+ on the same process, although it's not guaranteed that the PMU event is enabled
236+ first when context switching. For that reason it may be better to open the PMU
237+ event as a systemwide event and then open SPE on the process of interest.
238+
239+ Discard mode
240+ ~~~~~~~~~~~~
241+
242+ SPE related (SAMPLE_* etc) core PMU events can be used without the overhead of
243+ collecting sample data if discard mode is supported (optional from Armv8.6).
244+ First run a system wide SPE session (or on the core of interest) using options
245+ to minimize output. Then run perf stat:
246+
247+ perf record -e arm_spe/discard/ -a -N -B --no-bpf-event -o - > /dev/null &
248+ perf stat -e SAMPLE_FEED_LD
223249
224250SEE ALSO
225251--------
0 commit comments