Skip to content

Commit 10162e7

Browse files
committed
Merge branch 'for-next/perf' into for-next/core
* for-next/perf: (21 commits) arm_pmu: Drop redundant armpmu->map_event() in armpmu_event_init() drivers/perf: hisi: Add TLP filter support Documentation: perf: Indent filter options list of hisi-pcie-pmu docs: perf: Fix PMU instance name of hisi-pcie-pmu drivers/perf: hisi: Fix some event id for hisi-pcie-pmu arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI perf/amlogic: Remove unused header inclusions of <linux/version.h> perf/amlogic: Fix build error for x86_64 allmodconfig dt-binding: perf: Add Amlogic DDR PMU docs/perf: Add documentation for the Amlogic G12 DDR PMU perf/amlogic: Add support for Amlogic meson G12 SoC DDR PMU driver MAINTAINERS: Update HiSilicon PMU maintainers perf: arm_cspmu: Fix module cyclic dependency perf: arm_cspmu: Fix build failure on x86_64 perf: arm_cspmu: Fix modular builds due to missing MODULE_LICENSE()s perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute perf: arm_cspmu: Add support for ARM CoreSight PMU driver perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init() perf/arm_dmc620: Fix hotplug callback leak in dmc620_pmu_init() drivers: perf: marvell_cn10k: Fix hotplug callback leak in tad_pmu_init() ...
2 parents c947948 + 4361251 commit 10162e7

26 files changed

+3479
-58
lines changed

Documentation/admin-guide/perf/hisi-pcie-pmu.rst

Lines changed: 68 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ HiSilicon PCIe PMU driver
1515
The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe
1616
Core id.::
1717

18-
/sys/bus/event_source/hisi_pcie<sicl>_<core>
18+
/sys/bus/event_source/hisi_pcie<sicl>_core<core>
1919

2020
PMU driver provides description of available events and filter options in sysfs,
21-
see /sys/bus/event_source/devices/hisi_pcie<sicl>_<core>.
21+
see /sys/bus/event_source/devices/hisi_pcie<sicl>_core<core>.
2222

2323
The "format" directory describes all formats of the config (events) and config1
2424
(filter options) fields of the perf_event_attr structure. The "events" directory
@@ -33,13 +33,13 @@ monitored by PMU.
3333
Example usage of perf::
3434

3535
$# perf list
36-
hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event]
37-
hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event]
36+
hisi_pcie0_core0/rx_mwr_latency/ [kernel PMU event]
37+
hisi_pcie0_core0/rx_mwr_cnt/ [kernel PMU event]
3838
------------------------------------------
3939

40-
$# perf stat -e hisi_pcie0_0/rx_mwr_latency/
41-
$# perf stat -e hisi_pcie0_0/rx_mwr_cnt/
42-
$# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/
40+
$# perf stat -e hisi_pcie0_core0/rx_mwr_latency/
41+
$# perf stat -e hisi_pcie0_core0/rx_mwr_cnt/
42+
$# perf stat -g -e hisi_pcie0_core0/rx_mwr_latency/ -e hisi_pcie0_core0/rx_mwr_cnt/
4343

4444
The current driver does not support sampling. So "perf record" is unsupported.
4545
Also attach to a task is unsupported for PCIe PMU.
@@ -48,59 +48,83 @@ Filter options
4848
--------------
4949

5050
1. Target filter
51-
PMU could only monitor the performance of traffic downstream target Root Ports
52-
or downstream target Endpoint. PCIe PMU driver support "port" and "bdf"
53-
interfaces for users, and these two interfaces aren't supported at the same
54-
time.
5551

56-
-port
57-
"port" filter can be used in all PCIe PMU events, target Root Port can be
58-
selected by configuring the 16-bits-bitmap "port". Multi ports can be selected
59-
for AP-layer-events, and only one port can be selected for TL/DL-layer-events.
52+
PMU could only monitor the performance of traffic downstream target Root
53+
Ports or downstream target Endpoint. PCIe PMU driver support "port" and
54+
"bdf" interfaces for users, and these two interfaces aren't supported at the
55+
same time.
6056

61-
For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap
62-
should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes),
63-
bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101.
57+
- port
6458

65-
Example usage of perf::
59+
"port" filter can be used in all PCIe PMU events, target Root Port can be
60+
selected by configuring the 16-bits-bitmap "port". Multi ports can be
61+
selected for AP-layer-events, and only one port can be selected for
62+
TL/DL-layer-events.
6663

67-
$# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5
64+
For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of
65+
bitmap should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4
66+
lanes), bit8 is set, port=0x100; if these two Root Ports are both
67+
monitored, port=0x101.
6868

69-
-bdf
69+
Example usage of perf::
7070

71-
"bdf" filter can only be used in bandwidth events, target Endpoint is selected
72-
by configuring BDF to "bdf". Counter only counts the bandwidth of message
73-
requested by target Endpoint.
71+
$# perf stat -e hisi_pcie0_core0/rx_mwr_latency,port=0x1/ sleep 5
7472

75-
For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
73+
- bdf
7674

77-
Example usage of perf::
75+
"bdf" filter can only be used in bandwidth events, target Endpoint is
76+
selected by configuring BDF to "bdf". Counter only counts the bandwidth of
77+
message requested by target Endpoint.
78+
79+
For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0.
80+
81+
Example usage of perf::
7882

79-
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5
83+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,bdf=0x3900/ sleep 5
8084

8185
2. Trigger filter
82-
Event statistics start when the first time TLP length is greater/smaller
83-
than trigger condition. You can set the trigger condition by writing "trig_len",
84-
and set the trigger mode by writing "trig_mode". This filter can only be used
85-
in bandwidth events.
8686

87-
For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
88-
means statistics start when TLP length > trigger condition, "trig_mode=1"
89-
means start when TLP length < condition.
87+
Event statistics start when the first time TLP length is greater/smaller
88+
than trigger condition. You can set the trigger condition by writing
89+
"trig_len", and set the trigger mode by writing "trig_mode". This filter can
90+
only be used in bandwidth events.
9091

91-
Example usage of perf::
92+
For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0"
93+
means statistics start when TLP length > trigger condition, "trig_mode=1"
94+
means start when TLP length < condition.
95+
96+
Example usage of perf::
9297

93-
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
98+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5
9499

95100
3. Threshold filter
96-
Counter counts when TLP length within the specified range. You can set the
97-
threshold by writing "thr_len", and set the threshold mode by writing
98-
"thr_mode". This filter can only be used in bandwidth events.
99101

100-
For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
101-
counter counts when TLP length >= threshold, and "thr_mode=1" means counts
102-
when TLP length < threshold.
102+
Counter counts when TLP length within the specified range. You can set the
103+
threshold by writing "thr_len", and set the threshold mode by writing
104+
"thr_mode". This filter can only be used in bandwidth events.
103105

104-
Example usage of perf::
106+
For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means
107+
counter counts when TLP length >= threshold, and "thr_mode=1" means counts
108+
when TLP length < threshold.
109+
110+
Example usage of perf::
111+
112+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
113+
114+
4. TLP Length filter
115+
116+
When counting bandwidth, the data can be composed of certain parts of TLP
117+
packets. You can specify it through "len_mode":
118+
119+
- 2'b00: Reserved (Do not use this since the behaviour is undefined)
120+
- 2'b01: Bandwidth of TLP payloads
121+
- 2'b10: Bandwidth of TLP headers
122+
- 2'b11: Bandwidth of both TLP payloads and headers
123+
124+
For example, "len_mode=2" means only counting the bandwidth of TLP headers
125+
and "len_mode=3" means the final bandwidth data is composed of both TLP
126+
headers and payloads. Default value if not specified is 2'b11.
127+
128+
Example usage of perf::
105129

106-
$# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5
130+
$# perf stat -e hisi_pcie0_core0/rx_mrd_flux,len_mode=0x1/ sleep 5

Documentation/admin-guide/perf/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,3 +19,5 @@ Performance monitor support
1919
arm_dsu_pmu
2020
thunderx2-pmu
2121
alibaba_pmu
22+
nvidia-pmu
23+
meson-ddr-pmu
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
===========================================================
4+
Amlogic SoC DDR Bandwidth Performance Monitoring Unit (PMU)
5+
===========================================================
6+
7+
The Amlogic Meson G12 SoC contains a bandwidth monitor inside DRAM controller.
8+
The monitor includes 4 channels. Each channel can count the request accessing
9+
DRAM. The channel can count up to 3 AXI port simultaneously. It can be helpful
10+
to show if the performance bottleneck is on DDR bandwidth.
11+
12+
Currently, this driver supports the following 5 perf events:
13+
14+
+ meson_ddr_bw/total_rw_bytes/
15+
+ meson_ddr_bw/chan_1_rw_bytes/
16+
+ meson_ddr_bw/chan_2_rw_bytes/
17+
+ meson_ddr_bw/chan_3_rw_bytes/
18+
+ meson_ddr_bw/chan_4_rw_bytes/
19+
20+
meson_ddr_bw/chan_{1,2,3,4}_rw_bytes/ events are channel-specific events.
21+
Each channel support filtering, which can let the channel to monitor
22+
individual IP module in SoC.
23+
24+
Below are DDR access request event filter keywords:
25+
26+
+ arm - from CPU
27+
+ vpu_read1 - from OSD + VPP read
28+
+ gpu - from 3D GPU
29+
+ pcie - from PCIe controller
30+
+ hdcp - from HDCP controller
31+
+ hevc_front - from HEVC codec front end
32+
+ usb3_0 - from USB3.0 controller
33+
+ hevc_back - from HEVC codec back end
34+
+ h265enc - from HEVC encoder
35+
+ vpu_read2 - from DI read
36+
+ vpu_write1 - from VDIN write
37+
+ vpu_write2 - from di write
38+
+ vdec - from legacy codec video decoder
39+
+ hcodec - from H264 encoder
40+
+ ge2d - from ge2d
41+
+ spicc1 - from SPI controller 1
42+
+ usb0 - from USB2.0 controller 0
43+
+ dma - from system DMA controller 1
44+
+ arb0 - from arb0
45+
+ sd_emmc_b - from SD eMMC b controller
46+
+ usb1 - from USB2.0 controller 1
47+
+ audio - from Audio module
48+
+ sd_emmc_c - from SD eMMC c controller
49+
+ spicc2 - from SPI controller 2
50+
+ ethernet - from Ethernet controller
51+
52+
53+
Examples:
54+
55+
+ Show the total DDR bandwidth per seconds:
56+
57+
.. code-block:: bash
58+
59+
perf stat -a -e meson_ddr_bw/total_rw_bytes/ -I 1000 sleep 10
60+
61+
62+
+ Show individual DDR bandwidth from CPU and GPU respectively, as well as
63+
sum of them:
64+
65+
.. code-block:: bash
66+
67+
perf stat -a -e meson_ddr_bw/chan_1_rw_bytes,arm=1/ -I 1000 sleep 10
68+
perf stat -a -e meson_ddr_bw/chan_2_rw_bytes,gpu=1/ -I 1000 sleep 10
69+
perf stat -a -e meson_ddr_bw/chan_3_rw_bytes,arm=1,gpu=1/ -I 1000 sleep 10
70+

0 commit comments

Comments
 (0)