Skip to content

Commit 8bd09b4

Browse files
committed
Merge branch 'for-next/perf-user-counter-access' into for-next/perf
* for-next/perf-user-counter-access: Documentation: arm64: Document PMU counters access from userspace arm64: perf: Enable PMU counter userspace access for perf event arm64: perf: Add userspace counter access disable switch perf: Add a counter for number of user access events in context x86: perf: Move RDPMC event flag to a common definition
2 parents 1879a61 + aa1005d commit 8bd09b4

File tree

7 files changed

+236
-13
lines changed

7 files changed

+236
-13
lines changed

Documentation/admin-guide/sysctl/kernel.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -905,6 +905,17 @@ enabled, otherwise writing to this file will return ``-EBUSY``.
905905
The default value is 8.
906906

907907

908+
perf_user_access (arm64 only)
909+
=================================
910+
911+
Controls user space access for reading perf event counters. When set to 1,
912+
user space can read performance monitor counter registers directly.
913+
914+
The default value is 0 (access disabled).
915+
916+
See Documentation/arm64/perf.rst for more information.
917+
918+
908919
pid_max
909920
=======
910921

Documentation/arm64/perf.rst

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,10 @@
22
33
.. _perf_index:
44

5-
=====================
5+
====
6+
Perf
7+
====
8+
69
Perf Event Attributes
710
=====================
811

@@ -88,3 +91,76 @@ exclude_host. However when using !exclude_hv there is a small blackout
8891
window at the guest entry/exit where host events are not captured.
8992

9093
On VHE systems there are no blackout windows.
94+
95+
Perf Userspace PMU Hardware Counter Access
96+
==========================================
97+
98+
Overview
99+
--------
100+
The perf userspace tool relies on the PMU to monitor events. It offers an
101+
abstraction layer over the hardware counters since the underlying
102+
implementation is cpu-dependent.
103+
Arm64 allows userspace tools to have access to the registers storing the
104+
hardware counters' values directly.
105+
106+
This targets specifically self-monitoring tasks in order to reduce the overhead
107+
by directly accessing the registers without having to go through the kernel.
108+
109+
How-to
110+
------
111+
The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
112+
registers is enabled and that the userspace has access to the relevant
113+
information in order to use them.
114+
115+
In order to have access to the hardware counters, the global sysctl
116+
kernel/perf_user_access must first be enabled:
117+
118+
.. code-block:: sh
119+
120+
echo 1 > /proc/sys/kernel/perf_user_access
121+
122+
It is necessary to open the event using the perf tool interface with config1:1
123+
attr bit set: the sys_perf_event_open syscall returns a fd which can
124+
subsequently be used with the mmap syscall in order to retrieve a page of memory
125+
containing information about the event. The PMU driver uses this page to expose
126+
to the user the hardware counter's index and other necessary data. Using this
127+
index enables the user to access the PMU registers using the `mrs` instruction.
128+
Access to the PMU registers is only valid while the sequence lock is unchanged.
129+
In particular, the PMSELR_EL0 register is zeroed each time the sequence lock is
130+
changed.
131+
132+
The userspace access is supported in libperf using the perf_evsel__mmap()
133+
and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
134+
an example.
135+
136+
About heterogeneous systems
137+
---------------------------
138+
On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
139+
only be enabled when the tasks are pinned to a homogeneous subset of cores and
140+
the corresponding PMU instance is opened by specifying the 'type' attribute.
141+
The use of generic event types is not supported in this case.
142+
143+
Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
144+
can be run using the perf tool to check that the access to the registers works
145+
correctly from userspace:
146+
147+
.. code-block:: sh
148+
149+
perf test -v user
150+
151+
About chained events and counter sizes
152+
--------------------------------------
153+
The user can request either a 32-bit (config1:0 == 0) or 64-bit (config1:0 == 1)
154+
counter along with userspace access. The sys_perf_event_open syscall will fail
155+
if a 64-bit counter is requested and the hardware doesn't support 64-bit
156+
counters. Chained events are not supported in conjunction with userspace counter
157+
access. If a 32-bit counter is requested on hardware with 64-bit counters, then
158+
userspace must treat the upper 32-bits read from the counter as UNKNOWN. The
159+
'pmc_width' field in the user page will indicate the valid width of the counter
160+
and should be used to mask the upper bits as needed.
161+
162+
.. Links
163+
.. _tools/perf/arch/arm64/tests/user-events.c:
164+
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
165+
.. _tools/lib/perf/tests/test-evsel.c:
166+
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c

arch/arm64/kernel/perf_event.c

Lines changed: 128 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -285,15 +285,24 @@ static const struct attribute_group armv8_pmuv3_events_attr_group = {
285285

286286
PMU_FORMAT_ATTR(event, "config:0-15");
287287
PMU_FORMAT_ATTR(long, "config1:0");
288+
PMU_FORMAT_ATTR(rdpmc, "config1:1");
289+
290+
static int sysctl_perf_user_access __read_mostly;
288291

289292
static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
290293
{
291294
return event->attr.config1 & 0x1;
292295
}
293296

297+
static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
298+
{
299+
return event->attr.config1 & 0x2;
300+
}
301+
294302
static struct attribute *armv8_pmuv3_format_attrs[] = {
295303
&format_attr_event.attr,
296304
&format_attr_long.attr,
305+
&format_attr_rdpmc.attr,
297306
NULL,
298307
};
299308

@@ -362,7 +371,7 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = {
362371
*/
363372
#define ARMV8_IDX_CYCLE_COUNTER 0
364373
#define ARMV8_IDX_COUNTER0 1
365-
374+
#define ARMV8_IDX_CYCLE_COUNTER_USER 32
366375

367376
/*
368377
* We unconditionally enable ARMv8.5-PMU long event counter support
@@ -374,18 +383,22 @@ static bool armv8pmu_has_long_event(struct arm_pmu *cpu_pmu)
374383
return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5);
375384
}
376385

386+
static inline bool armv8pmu_event_has_user_read(struct perf_event *event)
387+
{
388+
return event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT;
389+
}
390+
377391
/*
378392
* We must chain two programmable counters for 64 bit events,
379393
* except when we have allocated the 64bit cycle counter (for CPU
380-
* cycles event). This must be called only when the event has
381-
* a counter allocated.
394+
* cycles event) or when user space counter access is enabled.
382395
*/
383396
static inline bool armv8pmu_event_is_chained(struct perf_event *event)
384397
{
385398
int idx = event->hw.idx;
386399
struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
387400

388-
return !WARN_ON(idx < 0) &&
401+
return !armv8pmu_event_has_user_read(event) &&
389402
armv8pmu_event_is_64bit(event) &&
390403
!armv8pmu_has_long_event(cpu_pmu) &&
391404
(idx != ARMV8_IDX_CYCLE_COUNTER);
@@ -718,6 +731,28 @@ static inline u32 armv8pmu_getreset_flags(void)
718731
return value;
719732
}
720733

734+
static void armv8pmu_disable_user_access(void)
735+
{
736+
write_sysreg(0, pmuserenr_el0);
737+
}
738+
739+
static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
740+
{
741+
int i;
742+
struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
743+
744+
/* Clear any unused counters to avoid leaking their contents */
745+
for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) {
746+
if (i == ARMV8_IDX_CYCLE_COUNTER)
747+
write_sysreg(0, pmccntr_el0);
748+
else
749+
armv8pmu_write_evcntr(i, 0);
750+
}
751+
752+
write_sysreg(0, pmuserenr_el0);
753+
write_sysreg(ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_CR, pmuserenr_el0);
754+
}
755+
721756
static void armv8pmu_enable_event(struct perf_event *event)
722757
{
723758
/*
@@ -761,6 +796,14 @@ static void armv8pmu_disable_event(struct perf_event *event)
761796

762797
static void armv8pmu_start(struct arm_pmu *cpu_pmu)
763798
{
799+
struct perf_event_context *task_ctx =
800+
this_cpu_ptr(cpu_pmu->pmu.pmu_cpu_context)->task_ctx;
801+
802+
if (sysctl_perf_user_access && task_ctx && task_ctx->nr_user)
803+
armv8pmu_enable_user_access(cpu_pmu);
804+
else
805+
armv8pmu_disable_user_access();
806+
764807
/* Enable all counters */
765808
armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
766809
}
@@ -878,13 +921,16 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
878921
if (evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) {
879922
if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask))
880923
return ARMV8_IDX_CYCLE_COUNTER;
924+
else if (armv8pmu_event_is_64bit(event) &&
925+
armv8pmu_event_want_user_access(event) &&
926+
!armv8pmu_has_long_event(cpu_pmu))
927+
return -EAGAIN;
881928
}
882929

883930
/*
884931
* Otherwise use events counters
885932
*/
886-
if (armv8pmu_event_is_64bit(event) &&
887-
!armv8pmu_has_long_event(cpu_pmu))
933+
if (armv8pmu_event_is_chained(event))
888934
return armv8pmu_get_chain_idx(cpuc, cpu_pmu);
889935
else
890936
return armv8pmu_get_single_idx(cpuc, cpu_pmu);
@@ -900,6 +946,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events *cpuc,
900946
clear_bit(idx - 1, cpuc->used_mask);
901947
}
902948

949+
static int armv8pmu_user_event_idx(struct perf_event *event)
950+
{
951+
if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event))
952+
return 0;
953+
954+
/*
955+
* We remap the cycle counter index to 32 to
956+
* match the offset applied to the rest of
957+
* the counter indices.
958+
*/
959+
if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
960+
return ARMV8_IDX_CYCLE_COUNTER_USER;
961+
962+
return event->hw.idx;
963+
}
964+
903965
/*
904966
* Add an event filter to a given event.
905967
*/
@@ -996,6 +1058,25 @@ static int __armv8_pmuv3_map_event(struct perf_event *event,
9961058
if (armv8pmu_event_is_64bit(event))
9971059
event->hw.flags |= ARMPMU_EVT_64BIT;
9981060

1061+
/*
1062+
* User events must be allocated into a single counter, and so
1063+
* must not be chained.
1064+
*
1065+
* Most 64-bit events require long counter support, but 64-bit
1066+
* CPU_CYCLES events can be placed into the dedicated cycle
1067+
* counter when this is free.
1068+
*/
1069+
if (armv8pmu_event_want_user_access(event)) {
1070+
if (!(event->attach_state & PERF_ATTACH_TASK))
1071+
return -EINVAL;
1072+
if (armv8pmu_event_is_64bit(event) &&
1073+
(hw_event_id != ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
1074+
!armv8pmu_has_long_event(armpmu))
1075+
return -EOPNOTSUPP;
1076+
1077+
event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT;
1078+
}
1079+
9991080
/* Only expose micro/arch events supported by this PMU */
10001081
if ((hw_event_id > 0) && (hw_event_id < ARMV8_PMUV3_MAX_COMMON_EVENTS)
10011082
&& test_bit(hw_event_id, armpmu->pmceid_bitmap)) {
@@ -1104,6 +1185,35 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
11041185
return probe.present ? 0 : -ENODEV;
11051186
}
11061187

1188+
static void armv8pmu_disable_user_access_ipi(void *unused)
1189+
{
1190+
armv8pmu_disable_user_access();
1191+
}
1192+
1193+
static int armv8pmu_proc_user_access_handler(struct ctl_table *table, int write,
1194+
void *buffer, size_t *lenp, loff_t *ppos)
1195+
{
1196+
int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos);
1197+
if (ret || !write || sysctl_perf_user_access)
1198+
return ret;
1199+
1200+
on_each_cpu(armv8pmu_disable_user_access_ipi, NULL, 1);
1201+
return 0;
1202+
}
1203+
1204+
static struct ctl_table armv8_pmu_sysctl_table[] = {
1205+
{
1206+
.procname = "perf_user_access",
1207+
.data = &sysctl_perf_user_access,
1208+
.maxlen = sizeof(unsigned int),
1209+
.mode = 0644,
1210+
.proc_handler = armv8pmu_proc_user_access_handler,
1211+
.extra1 = SYSCTL_ZERO,
1212+
.extra2 = SYSCTL_ONE,
1213+
},
1214+
{ }
1215+
};
1216+
11071217
static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
11081218
int (*map_event)(struct perf_event *event),
11091219
const struct attribute_group *events,
@@ -1127,6 +1237,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
11271237
cpu_pmu->set_event_filter = armv8pmu_set_event_filter;
11281238
cpu_pmu->filter_match = armv8pmu_filter_match;
11291239

1240+
cpu_pmu->pmu.event_idx = armv8pmu_user_event_idx;
1241+
11301242
cpu_pmu->name = name;
11311243
cpu_pmu->map_event = map_event;
11321244
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
@@ -1136,6 +1248,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
11361248
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ?
11371249
caps : &armv8_pmuv3_caps_attr_group;
11381250

1251+
register_sysctl("kernel", armv8_pmu_sysctl_table);
1252+
11391253
return 0;
11401254
}
11411255

@@ -1301,6 +1415,14 @@ void arch_perf_update_userpage(struct perf_event *event,
13011415
userpg->cap_user_time = 0;
13021416
userpg->cap_user_time_zero = 0;
13031417
userpg->cap_user_time_short = 0;
1418+
userpg->cap_user_rdpmc = armv8pmu_event_has_user_read(event);
1419+
1420+
if (userpg->cap_user_rdpmc) {
1421+
if (event->hw.flags & ARMPMU_EVT_64BIT)
1422+
userpg->pmc_width = 64;
1423+
else
1424+
userpg->pmc_width = 32;
1425+
}
13041426

13051427
do {
13061428
rd = sched_clock_read_begin(&seq);

arch/x86/events/core.c

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2476,7 +2476,7 @@ static int x86_pmu_event_init(struct perf_event *event)
24762476

24772477
if (READ_ONCE(x86_pmu.attr_rdpmc) &&
24782478
!(event->hw.flags & PERF_X86_EVENT_LARGE_PEBS))
2479-
event->hw.flags |= PERF_X86_EVENT_RDPMC_ALLOWED;
2479+
event->hw.flags |= PERF_EVENT_FLAG_USER_READ_CNT;
24802480

24812481
return err;
24822482
}
@@ -2510,7 +2510,7 @@ void perf_clear_dirty_counters(void)
25102510

25112511
static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
25122512
{
2513-
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
2513+
if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
25142514
return;
25152515

25162516
/*
@@ -2531,7 +2531,7 @@ static void x86_pmu_event_mapped(struct perf_event *event, struct mm_struct *mm)
25312531

25322532
static void x86_pmu_event_unmapped(struct perf_event *event, struct mm_struct *mm)
25332533
{
2534-
if (!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED))
2534+
if (!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT))
25352535
return;
25362536

25372537
if (atomic_dec_and_test(&mm->context.perf_rdpmc_allowed))
@@ -2542,7 +2542,7 @@ static int x86_pmu_event_idx(struct perf_event *event)
25422542
{
25432543
struct hw_perf_event *hwc = &event->hw;
25442544

2545-
if (!(hwc->flags & PERF_X86_EVENT_RDPMC_ALLOWED))
2545+
if (!(hwc->flags & PERF_EVENT_FLAG_USER_READ_CNT))
25462546
return 0;
25472547

25482548
if (is_metric_idx(hwc->idx))
@@ -2725,7 +2725,7 @@ void arch_perf_update_userpage(struct perf_event *event,
27252725
userpg->cap_user_time = 0;
27262726
userpg->cap_user_time_zero = 0;
27272727
userpg->cap_user_rdpmc =
2728-
!!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED);
2728+
!!(event->hw.flags & PERF_EVENT_FLAG_USER_READ_CNT);
27292729
userpg->pmc_width = x86_pmu.cntval_bits;
27302730

27312731
if (!using_native_sched_clock() || !sched_clock_stable())

0 commit comments

Comments
 (0)