Skip to content

Commit 4922add

Browse files
Merge pull request #2399 from jasonrandrews/review
Update topdown-tool output
2 parents b9dc153 + ca803aa commit 4922add

File tree

1 file changed

+83
-35
lines changed

1 file changed

+83
-35
lines changed

content/learning-paths/cross-platform/topdown-compare/2-code-examples.md

Lines changed: 83 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -153,11 +153,17 @@ The output is similar to:
153153

154154
```output
155155
Performing 1000000000 dependent floating-point divisions...
156+
Monitoring command: test. Hit Ctrl-C to stop.
157+
Run 1
156158
Done. Final result: 0.000056
157-
Stage 2 (uarch metrics)
158-
=======================
159-
[General]
160-
Instructions Per Cycle 0.355 per cycle
159+
CPU Neoverse V2 metrics
160+
└── Stage 2 (uarch metrics)
161+
└── General (General)
162+
└── ┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┓
163+
┃ Metric ┃ Value ┃ Unit ┃
164+
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━┩
165+
│ Instructions Per Cycle │ 0.324 │ per cycle │
166+
└────────────────────────┴───────┴───────────┘
161167
```
162168

163169
Collect the Stage 1 topdown metrics using Arm's cycle accounting:
@@ -170,12 +176,18 @@ The output is similar to:
170176

171177
```output
172178
Performing 1000000000 dependent floating-point divisions...
179+
Monitoring command: test. Hit Ctrl-C to stop.
180+
Run 1
173181
Done. Final result: 0.000056
174-
Stage 1 (Topdown metrics)
175-
=========================
176-
[Cycle Accounting]
177-
Frontend Stalled Cycles 0.04% cycles
178-
Backend Stalled Cycles. 88.15% cycles
182+
CPU Neoverse V2 metrics
183+
└── Stage 2 (uarch metrics)
184+
└── Cycle Accounting (Cycle_Accounting)
185+
└── ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
186+
┃ Metric ┃ Value ┃ Unit ┃
187+
┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
188+
│ Backend Stalled Cycles │ 93.22 │ % │
189+
│ Frontend Stalled Cycles │ 0.03 │ % │
190+
└─────────────────────────┴───────┴──────┘
179191
```
180192

181193
This confirms the example has high backend stalls, equivalent to x86's Backend_Bound category. Notice how Arm's Stage 1 uses percentage of cycles rather than Intel's slot-based accounting.
@@ -192,12 +204,20 @@ The output is similar to:
192204

193205
```output
194206
Performing 1000000000 dependent floating-point divisions...
207+
Monitoring command: test. Hit Ctrl-C to stop.
208+
Run 1
195209
Done. Final result: 0.000056
196-
Stage 2 (uarch metrics)
197-
=======================
198-
[L1 Data Cache Effectiveness]
199-
L1D Cache MPKI............... 0.023 misses per 1,000 instructions
200-
L1D Cache Miss Ratio......... 0.000 per cache access
210+
CPU Neoverse V2 metrics
211+
└── Stage 2 (uarch metrics)
212+
└── L1 Data Cache Effectiveness (L1D_Cache_Effectiveness)
213+
├── Follows
214+
│ └── Backend Bound (backend_bound)
215+
└── ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
216+
┃ Metric ┃ Value ┃ Unit ┃
217+
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
218+
│ L1D Cache Miss Ratio │ 0.000 │ per cache access │
219+
│ L1D Cache MPKI │ 0.129 │ misses per 1,000 instructions │
220+
└──────────────────────┴───────┴───────────────────────────────┘
201221
```
202222

203223
For L1 instruction cache effectiveness:
@@ -210,12 +230,20 @@ The output is similar to:
210230

211231
```output
212232
Performing 1000000000 dependent floating-point divisions...
233+
Monitoring command: test. Hit Ctrl-C to stop.
234+
Run 1
213235
Done. Final result: 0.000056
214-
Stage 2 (uarch metrics)
215-
=======================
216-
[L1 Instruction Cache Effectiveness]
217-
L1I Cache MPKI............... 0.022 misses per 1,000 instructions
218-
L1I Cache Miss Ratio......... 0.000 per cache access
236+
CPU Neoverse V2 metrics
237+
└── Stage 2 (uarch metrics)
238+
└── L1 Instruction Cache Effectiveness (L1I_Cache_Effectiveness)
239+
├── Follows
240+
│ └── Frontend Bound (frontend_bound)
241+
└── ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
242+
┃ Metric ┃ Value ┃ Unit ┃
243+
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
244+
│ L1I Cache Miss Ratio │ 0.003 │ per cache access │
245+
│ L1I Cache MPKI │ 0.474 │ misses per 1,000 instructions │
246+
└──────────────────────┴───────┴───────────────────────────────┘
219247
```
220248

221249
For last level cache:
@@ -228,13 +256,22 @@ The output is similar to:
228256

229257
```output
230258
Performing 1000000000 dependent floating-point divisions...
259+
Monitoring command: test. Hit Ctrl-C to stop.
260+
Run 1
231261
Done. Final result: 0.000056
232-
Stage 2 (uarch metrics)
233-
=======================
234-
[Last Level Cache Effectiveness]
235-
LL Cache Read MPKI.............. 0.017 misses per 1,000 instructions
236-
LL Cache Read Miss Ratio........ 0.802 per cache access
237-
LL Cache Read Hit Ratio......... 0.198 per cache access
262+
CPU Neoverse V2 metrics
263+
└── Stage 2 (uarch metrics)
264+
└── Last Level Cache Effectiveness (LL_Cache_Effectiveness)
265+
├── Follows
266+
│ ├── Backend Bound (backend_bound)
267+
│ └── Frontend Bound (frontend_bound)
268+
└── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
269+
┃ Metric ┃ Value ┃ Unit ┃
270+
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
271+
│ LL Cache Read Hit Ratio │ nan │ per cache access │
272+
│ LL Cache Read Miss Ratio │ nan │ per cache access │
273+
│ LL Cache Read MPKI │ 0.000 │ misses per 1,000 instructions │
274+
└──────────────────────────┴───────┴───────────────────────────────┘
238275
```
239276

240277
For operation mix:
@@ -247,17 +284,28 @@ The output is similar to:
247284

248285
```output
249286
Performing 1000000000 dependent floating-point divisions...
287+
Monitoring command: test. Hit Ctrl-C to stop.
288+
Run 1
250289
Done. Final result: 0.000056
251-
Stage 2 (uarch metrics)
252-
=======================
253-
[Speculative Operation Mix]
254-
Load Operations Percentage.......... 16.70% operations
255-
Store Operations Percentage......... 16.59% operations
256-
Integer Operations Percentage....... 33.61% operations
257-
Advanced SIMD Operations Percentage. 0.00% operations
258-
Floating Point Operations Percentage 16.45% operations
259-
Branch Operations Percentage........ 16.65% operations
260-
Crypto Operations Percentage........ 0.00% operations
290+
CPU Neoverse V2 metrics
291+
└── Stage 2 (uarch metrics)
292+
└── Speculative Operation Mix (Operation_Mix)
293+
├── Follows
294+
│ ├── Backend Bound (backend_bound)
295+
│ └── Retiring (retiring)
296+
└── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
297+
┃ Metric ┃ Value ┃ Unit ┃
298+
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
299+
│ Barrier Operations Percentage │ ❌ │ % │
300+
│ Branch Operations Percentage │ ❌ │ % │
301+
│ Crypto Operations Percentage │ 0.00 │ % │
302+
│ Integer Operations Percentage │ 33.52 │ % │
303+
│ Load Operations Percentage │ 16.69 │ % │
304+
│ Floating Point Operations Percentage │ 16.51 │ % │
305+
│ Advanced SIMD Operations Percentage │ 0.00 │ % │
306+
│ Store Operations Percentage │ 16.58 │ % │
307+
│ SVE Operations (Load/Store Inclusive) Percentage │ 0.00 │ % │
308+
└──────────────────────────────────────────────────┴───────┴──────┘
261309
```
262310

263311

0 commit comments

Comments
 (0)