Skip to content

Commit 95f1fa5

Browse files
committed
initial commit
0 parents  commit 95f1fa5

30 files changed

+8507
-0
lines changed

LICENSE

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
*Copyright (C) ARM Limited, 2014-2024. All rights reserved.*
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at:
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License.

README-cmn.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
CMN background
2+
--------------
3+
The CMN interconnect is a rectangular grid comprised of
4+
"crosspoints" (XPs). Nodes of various types are attached to
5+
crosspoints. Some systems may have more than one CMN
6+
interconnect - for instance a multi-socket system would
7+
have at least one per socket.
8+
9+
Important CMN node types include:
10+
11+
- requesters (RN-F): CPUs are attached to these
12+
13+
- memory home nodes (HN-F): these handle all memory requests
14+
from the CPU and also contain slices of system cache
15+
16+
- subordinate nodes (SN-F): these interface to memory
17+
controllers, which manage DDR modules
18+
19+
- chip-to-chip gateways (CCG): these act as bidirectional
20+
interfaces to other CMN interconnects.
21+
22+
Full details of CMN can be found in Arm's product documentation.
23+

README-discovery.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
CMN mesh discovery
2+
==================
3+
4+
The other tools in this collection depend on knowing the CMN
5+
interconnect topology. The CMN tools include a script to discover
6+
the topology and save it in a JSON file for future reference.
7+
8+
It is expected that this discovery procedure only needs to be
9+
run once per system type.
10+
11+
12+
What this script does
13+
---------------------
14+
For background on CMN topology, see README-cmn.md.
15+
16+
This script first tries to discover the number and location
17+
of CMN interconnects in the system memory space.
18+
19+
It then accesses each CMN memory space to discover properties
20+
of the interconnect:
21+
22+
- the specific interconnect version (e.g. CMN-600, CMN-700)
23+
24+
- the X/Y dimensions of the rectangular mesh; there are
25+
X*Y crosspoints (XPs), one at each connection point
26+
27+
- the number of device ports on each XP; generally this is
28+
0, 1 or 2
29+
30+
- the type of device attached to each device port, e.g.
31+
RN-F, HN-F, RN-I etc.
32+
33+
The script does not discover where CPUs are located in the
34+
interconnect; this is done by a separate script. (See README.md.)
35+
36+
37+
Prerequisites for running CMN mesh discovery
38+
--------------------------------------------
39+
40+
- The system must use Arm's CMN family interconnect.
41+
42+
- The system must have CMN in the memory map. This generally
43+
implies a bare-metal server or "metal" instance.
44+
If "perf list" shows the CMN events, the CMN is visible.
45+
46+
- The kernel must be built with CONFIG_DEVMEM, so that
47+
``/dev/mem`` is visible in the file system
48+
49+
- The user must have sufficient privilege to open ``/dev/mem``.
50+
Generally this requires root privilege.
51+
52+
53+
Running the CMN mesh discovery script
54+
-------------------------------------
55+
56+
The script can be run as follows:
57+
58+
python cmn_discover.py
59+
60+
This will create a file ``cmn-system.json`` with details of the
61+
CMN mesh topology. By default, this is saved in
62+
63+
~/.cache/arm/cmn-system.json
64+
65+
It will print summary details of the CMN mesh.
66+
67+
68+
Discovering the CPU locations
69+
-----------------------------
70+
This step is optional, but allows tools to refer to CPUs under
71+
their Linux identities rather than physical request ports.
72+
73+
This step takes the topology description JSON file as input and
74+
generates traffic to discover CPU locations. The goal is to
75+
detect which request port (RN-F) the CPU is attached to, and also
76+
the logical id (LPID) by which it is identified in requests.
77+
78+
The system must be reasonably free of other load. If successful,
79+
the discovery script will update the cached JSON file.
80+
81+
Depending on system design, the mapping of CPUs to interconnect
82+
locations may be universal across instances, or it may vary from
83+
instance to instance (i.e. from chip to chip).
84+
85+
To discover the CPU locations, run:
86+
87+
python cmn_detect_cpu.py
88+
89+
Depending on the interconnect design, there are three possible
90+
outcomes, which impact on later analysis:
91+
92+
- at most one CPU per request port
93+
94+
- several CPUs per request port, distinguished by LPID
95+
96+
- several CPUs per request port, not distinguished by LPID
97+
98+
In the last case, CPU-centric analysis may be more approximate,
99+
as traffic can only be associated with a group of CPUs.

README.md

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
CMN System Investigation Tools
2+
==============================
3+
4+
These tools help developers understand system performance on
5+
systems based on Arm's CMN family interconnects (CMN-600, CMN-650,
6+
CMN-700, CI-700 etc.).
7+
8+
The tools are aimed at developers of complex multithreaded
9+
applications and middleware, and at system administrators and
10+
others who need to understand whole-node performance.
11+
12+
The tools generally assume root privilege, and direct access
13+
to a server or a "metal" instance.
14+
15+
Tools are provided to:
16+
17+
- discover the CMN mesh topology and record it in a JSON file
18+
19+
- discover the mapping of Linux CPUs to mesh nodes
20+
21+
- visualize the mesh topology as a 2-D diagram
22+
23+
- construct PMU event specifiers (including complex watchpoints)
24+
for use with "perf" tools
25+
26+
- collect histograms and metrics to understand system traffic
27+
behaviors associated with common scenarios
28+
29+
The tools are mostly written in Python. Any recent version of
30+
Python3 should be sufficient. Some tools may work with Python2.
31+
32+
33+
Setup
34+
=====
35+
36+
Some of the tools need a system topology description, to
37+
identify the specific configuration, topology and CPU locations
38+
for your system. This may already be available for your system.
39+
If not, it can be created using the discovery tools (see below).
40+
41+
The default location for this file in your home directory is:
42+
43+
~/.cache/arm/cmn-system.json
44+
45+
Discovery scripts and other tools will create, update or use
46+
this file as appropriate.
47+
48+
49+
Creating the topology description file
50+
--------------------------------------
51+
This step should only need to be done once, but needs a
52+
significant level of system privilege. See README-discovery.md.
53+
54+
55+
Visualizing a CMN interconnect
56+
------------------------------
57+
58+
The interconnect can be visualized as a text diagram. Run:
59+
60+
python cmn_diagram.py
61+
62+
This will print a text diagram like this:
63+
64+
0c:RN-D 2c:RN-D 4c:RN-F:#0,#1 6c:SN-F
65+
/ / / /
66+
08(0,1)--------------28(1,1)--------------48(2,1)--------------68(3,1)
67+
/| /| /| /|
68+
08:RN-D | 28:HN-F | 48:HN-F | 68:HN-D |
69+
| | | |
70+
| 04:RN-D | 24:HN-F | 44:HN-F | 64:SBSX
71+
|/ |/ |/ |/
72+
00(0,0)--------------20(1,0)--------------40(2,0)--------------60(3,0)
73+
/ / / /
74+
00:CXRH 20:RN-F:#2,#3 40:RN-D 60:SN-F
75+
76+
77+
Recap - discovering the mesh topology
78+
-------------------------------------
79+
80+
Let's recap the CMN discovery process:
81+
82+
sudo python cmn_discover.py
83+
python cmn_detect_cpu.py
84+
python cmn_diagram_py
85+
86+
If this succeeds, you should have a cached CMN configuration file in
87+
``~/.cache/arm/cmn-system.json``, and a diagram of the mesh will
88+
appear on the console.
89+
90+
If problems occur see the "troubleshooting" section.
91+
92+
93+
Top-down analysis
94+
-----------------
95+
96+
Top-down performance analysis aims at finding the significant
97+
contributors to system bandwidth. It analyzes system usage, rather
98+
than specific applications.
99+
100+
The ``cmn_topdown.py`` script provides several levels of top-down
101+
performance analysis, using CMN PMU events. Currently three
102+
levels are featured:
103+
104+
- Level 1 identifies which requesters are dominant (CPU vs. I/O)
105+
106+
- Level 2, for multi-die or multi-socket systems, measures local
107+
versus remote access
108+
109+
- Level 3 further characterizes memory accesses into system
110+
cache hits and misses.
111+
112+
The exact process of top-down analysis may vary across different
113+
systems, depending on CMN version and configuration.
114+
115+
Top-down analysis is currently at an experimental stage and will
116+
be significantly enhanced in upcoming releases of these tools.
117+
118+
119+
Constructing CHI watchpoint strings
120+
-----------------------------------
121+
122+
If the Linux CMN PMU driver is installed, CMN perf events are
123+
available through the ``perf_event_open`` interface and the ``perf``
124+
userspace tools. These should be sufficient for many purposes.
125+
126+
In some cases it may be useful to construct CMN watchpoints to
127+
match and count certan types of interconnect traffic. This generally
128+
requires some level of knowledge of the CHI architecture.
129+
The ``cmnwatch.py`` script can be used to generate strings that
130+
match CHI flits. The strings can be passed to the ``perf`` command.
131+
132+
perf stat -e `python cmnwatch.py up:req:opcode=Evict` ...
133+
134+
will expand into one or more CMN ``watchpoint_up`` events,
135+
that will count all flits (interconnect packets) matching the
136+
selected fields.
137+
138+
Watchpoints can refer to a subset of CHI fields. Not all fields
139+
can be matched.
140+
141+
142+
Troubleshooting
143+
===============
144+
145+
It is difficult to cover all possible problems that might be
146+
encountered but we can cover some common issues:
147+
148+
- the system might not be based on an Arm CMN interconnect.
149+
cmn_discover.py will report this.
150+
151+
- the Linux CMN PMU driver might not be installed and enabled.
152+
Check for /sys/devices/arm_cmn_0. (A future version of this
153+
guide might explain how to enable the CMN PMU driver.)
154+
155+
- insufficient privilege to see CMN PMU events. Try this:
156+
``sysctl kernel.perf_event_paranoid=0``
157+
158+
- due to security settings, some systems do not provide
159+
visibility of the RSP and DAT channels in the interconnect.
160+
Some use cases for advanced watchpoints will not be
161+
available.
162+
163+
Please also see TODO.md which lists some known limitations that
164+
may be addressed in future releases.
165+
166+
167+
License Information
168+
===================
169+
170+
*Copyright (C) ARM Limited, 2024. All rights reserved.*
171+
172+
Licensed under the Apache License, Version 2.0 (the "License");
173+
you may not use this file except in compliance with the License.
174+
You may obtain a copy of the License at:
175+
176+
http://www.apache.org/licenses/LICENSE-2.0
177+
178+
Unless required by applicable law or agreed to in writing, software
179+
distributed under the License is distributed on an "AS IS" BASIS,
180+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
181+
See the License for the specific language governing permissions and
182+
limitations under the License.
183+

TODO.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
Known limitations and assumptions in the CMN tools
2+
==================================================
3+
4+
This note describes some known limitations in the CMN toolkit
5+
which may be addressed in future releases.
6+
7+
- The tools currently assume that the Linux PMUs "arm_cmn_<n>"
8+
are ordered according to CMN base physical address. In fact this
9+
is not guaranteed, and the PMU ordering can vary between reboots.
10+
The correspondence between PMUs and meshes (and their CPU maps)
11+
should be established at least once per reboot. This will likely
12+
need to be done empirically, similar to CPU discovery.
13+
14+

0 commit comments

Comments
 (0)