@@ -56,14 +56,14 @@ Example usage of perf::
56
56
For HiSilicon uncore PMU v2 whose identifier is 0x30, the topology is the same
57
57
as PMU v1, but some new functions are added to the hardware.
58
58
59
- (a) L3C PMU supports filtering by core/thread within the cluster which can be
59
+ 1. L3C PMU supports filtering by core/thread within the cluster which can be
60
60
specified as a bitmap::
61
61
62
62
$# perf stat -a -e hisi_sccl3_l3c0/config=0x02,tt_core=0x3/ sleep 5
63
63
64
64
This will only count the operations from core/thread 0 and 1 in this cluster.
65
65
66
- (b) Tracetag allow the user to chose to count only read, write or atomic
66
+ 2. Tracetag allow the user to chose to count only read, write or atomic
67
67
operations via the tt_req parameeter in perf. The default value counts all
68
68
operations. tt_req is 3bits, 3'b100 represents read operations, 3'b101
69
69
represents write operations, 3'b110 represents atomic store operations and
@@ -73,38 +73,42 @@ represents write operations, 3'b110 represents atomic store operations and
73
73
74
74
This will only count the read operations in this cluster.
75
75
76
- (c) Datasrc allows the user to check where the data comes from. It is 5 bits.
76
+ 3. Datasrc allows the user to check where the data comes from. It is 5 bits.
77
77
Some important codes are as follows:
78
- 5'b00001: comes from L3C in this die;
79
- 5'b01000: comes from L3C in the cross-die;
80
- 5'b01001: comes from L3C which is in another socket;
81
- 5'b01110: comes from the local DDR;
82
- 5'b01111: comes from the cross-die DDR;
83
- 5'b10000: comes from cross-socket DDR;
78
+
79
+ - 5'b00001: comes from L3C in this die;
80
+ - 5'b01000: comes from L3C in the cross-die;
81
+ - 5'b01001: comes from L3C which is in another socket;
82
+ - 5'b01110: comes from the local DDR;
83
+ - 5'b01111: comes from the cross-die DDR;
84
+ - 5'b10000: comes from cross-socket DDR;
85
+
84
86
etc, it is mainly helpful to find that the data source is nearest from the CPU
85
87
cores. If datasrc_cfg is used in the multi-chips, the datasrc_skt shall be
86
88
configured in perf command::
87
89
88
90
$# perf stat -a -e hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xE/,
89
91
hisi_sccl3_l3c0/config=0xb9,datasrc_cfg=0xF/ sleep 5
90
92
91
- (d) Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
93
+ 4. Some HiSilicon SoCs encapsulate multiple CPU and IO dies. Each CPU die
92
94
contains several Compute Clusters (CCLs). The I/O dies are called Super I/O
93
95
clusters (SICL) containing multiple I/O clusters (ICLs). Each CCL/ICL in the
94
96
SoC has a unique ID. Each ID is 11bits, include a 6-bit SCCL-ID and 5-bit
95
97
CCL/ICL-ID. For I/O die, the ICL-ID is followed by:
96
- 5'b00000: I/O_MGMT_ICL;
97
- 5'b00001: Network_ICL;
98
- 5'b00011: HAC_ICL;
99
- 5'b10000: PCIe_ICL;
100
98
101
- (e) uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
99
+ - 5'b00000: I/O_MGMT_ICL;
100
+ - 5'b00001: Network_ICL;
101
+ - 5'b00011: HAC_ICL;
102
+ - 5'b10000: PCIe_ICL;
103
+
104
+ 5. uring_channel: UC PMU events 0x47~0x59 supports filtering by tx request
102
105
uring channel. It is 2 bits. Some important codes are as follows:
103
- 2'b11: count the events which sent to the uring_ext (MATA) channel;
104
- 2'b01: is the same as 2'b11;
105
- 2'b10: count the events which sent to the uring (non-MATA) channel;
106
- 2'b00: default value, count the events which sent to the both uring and
107
- uring_ext channel;
106
+
107
+ - 2'b11: count the events which sent to the uring_ext (MATA) channel;
108
+ - 2'b01: is the same as 2'b11;
109
+ - 2'b10: count the events which sent to the uring (non-MATA) channel;
110
+ - 2'b00: default value, count the events which sent to the both uring and
111
+ uring_ext channel;
108
112
109
113
Users could configure IDs to count data come from specific CCL/ICL, by setting
110
114
srcid_cmd & srcid_msk, and data desitined for specific CCL/ICL by setting
0 commit comments