@@ -153,11 +153,17 @@ The output is similar to:
153153
154154``` output
155155Performing 1000000000 dependent floating-point divisions...
156+ Monitoring command: test. Hit Ctrl-C to stop.
157+ Run 1
156158Done. Final result: 0.000056
157- Stage 2 (uarch metrics)
158- =======================
159- [General]
160- Instructions Per Cycle 0.355 per cycle
159+ CPU Neoverse V2 metrics
160+ └── Stage 2 (uarch metrics)
161+ └── General (General)
162+ └── ┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━┓
163+ ┃ Metric ┃ Value ┃ Unit ┃
164+ ┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━┩
165+ │ Instructions Per Cycle │ 0.324 │ per cycle │
166+ └────────────────────────┴───────┴───────────┘
161167```
162168
163169Collect the Stage 1 topdown metrics using Arm's cycle accounting:
@@ -170,12 +176,18 @@ The output is similar to:
170176
171177``` output
172178Performing 1000000000 dependent floating-point divisions...
179+ Monitoring command: test. Hit Ctrl-C to stop.
180+ Run 1
173181Done. Final result: 0.000056
174- Stage 1 (Topdown metrics)
175- =========================
176- [Cycle Accounting]
177- Frontend Stalled Cycles 0.04% cycles
178- Backend Stalled Cycles. 88.15% cycles
182+ CPU Neoverse V2 metrics
183+ └── Stage 2 (uarch metrics)
184+ └── Cycle Accounting (Cycle_Accounting)
185+ └── ┏━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
186+ ┃ Metric ┃ Value ┃ Unit ┃
187+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
188+ │ Backend Stalled Cycles │ 93.22 │ % │
189+ │ Frontend Stalled Cycles │ 0.03 │ % │
190+ └─────────────────────────┴───────┴──────┘
179191```
180192
181193This confirms the example has high backend stalls, equivalent to x86's Backend_Bound category. Notice how Arm's Stage 1 uses percentage of cycles rather than Intel's slot-based accounting.
@@ -192,12 +204,20 @@ The output is similar to:
192204
193205``` output
194206Performing 1000000000 dependent floating-point divisions...
207+ Monitoring command: test. Hit Ctrl-C to stop.
208+ Run 1
195209Done. Final result: 0.000056
196- Stage 2 (uarch metrics)
197- =======================
198- [L1 Data Cache Effectiveness]
199- L1D Cache MPKI............... 0.023 misses per 1,000 instructions
200- L1D Cache Miss Ratio......... 0.000 per cache access
210+ CPU Neoverse V2 metrics
211+ └── Stage 2 (uarch metrics)
212+ └── L1 Data Cache Effectiveness (L1D_Cache_Effectiveness)
213+ ├── Follows
214+ │ └── Backend Bound (backend_bound)
215+ └── ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
216+ ┃ Metric ┃ Value ┃ Unit ┃
217+ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
218+ │ L1D Cache Miss Ratio │ 0.000 │ per cache access │
219+ │ L1D Cache MPKI │ 0.129 │ misses per 1,000 instructions │
220+ └──────────────────────┴───────┴───────────────────────────────┘
201221```
202222
203223For L1 instruction cache effectiveness:
@@ -210,12 +230,20 @@ The output is similar to:
210230
211231``` output
212232Performing 1000000000 dependent floating-point divisions...
233+ Monitoring command: test. Hit Ctrl-C to stop.
234+ Run 1
213235Done. Final result: 0.000056
214- Stage 2 (uarch metrics)
215- =======================
216- [L1 Instruction Cache Effectiveness]
217- L1I Cache MPKI............... 0.022 misses per 1,000 instructions
218- L1I Cache Miss Ratio......... 0.000 per cache access
236+ CPU Neoverse V2 metrics
237+ └── Stage 2 (uarch metrics)
238+ └── L1 Instruction Cache Effectiveness (L1I_Cache_Effectiveness)
239+ ├── Follows
240+ │ └── Frontend Bound (frontend_bound)
241+ └── ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
242+ ┃ Metric ┃ Value ┃ Unit ┃
243+ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
244+ │ L1I Cache Miss Ratio │ 0.003 │ per cache access │
245+ │ L1I Cache MPKI │ 0.474 │ misses per 1,000 instructions │
246+ └──────────────────────┴───────┴───────────────────────────────┘
219247```
220248
221249For last level cache:
@@ -228,13 +256,22 @@ The output is similar to:
228256
229257``` output
230258Performing 1000000000 dependent floating-point divisions...
259+ Monitoring command: test. Hit Ctrl-C to stop.
260+ Run 1
231261Done. Final result: 0.000056
232- Stage 2 (uarch metrics)
233- =======================
234- [Last Level Cache Effectiveness]
235- LL Cache Read MPKI.............. 0.017 misses per 1,000 instructions
236- LL Cache Read Miss Ratio........ 0.802 per cache access
237- LL Cache Read Hit Ratio......... 0.198 per cache access
262+ CPU Neoverse V2 metrics
263+ └── Stage 2 (uarch metrics)
264+ └── Last Level Cache Effectiveness (LL_Cache_Effectiveness)
265+ ├── Follows
266+ │ ├── Backend Bound (backend_bound)
267+ │ └── Frontend Bound (frontend_bound)
268+ └── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
269+ ┃ Metric ┃ Value ┃ Unit ┃
270+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
271+ │ LL Cache Read Hit Ratio │ nan │ per cache access │
272+ │ LL Cache Read Miss Ratio │ nan │ per cache access │
273+ │ LL Cache Read MPKI │ 0.000 │ misses per 1,000 instructions │
274+ └──────────────────────────┴───────┴───────────────────────────────┘
238275```
239276
240277For operation mix:
@@ -247,17 +284,28 @@ The output is similar to:
247284
248285``` output
249286Performing 1000000000 dependent floating-point divisions...
287+ Monitoring command: test. Hit Ctrl-C to stop.
288+ Run 1
250289Done. Final result: 0.000056
251- Stage 2 (uarch metrics)
252- =======================
253- [Speculative Operation Mix]
254- Load Operations Percentage.......... 16.70% operations
255- Store Operations Percentage......... 16.59% operations
256- Integer Operations Percentage....... 33.61% operations
257- Advanced SIMD Operations Percentage. 0.00% operations
258- Floating Point Operations Percentage 16.45% operations
259- Branch Operations Percentage........ 16.65% operations
260- Crypto Operations Percentage........ 0.00% operations
290+ CPU Neoverse V2 metrics
291+ └── Stage 2 (uarch metrics)
292+ └── Speculative Operation Mix (Operation_Mix)
293+ ├── Follows
294+ │ ├── Backend Bound (backend_bound)
295+ │ └── Retiring (retiring)
296+ └── ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━┓
297+ ┃ Metric ┃ Value ┃ Unit ┃
298+ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━┩
299+ │ Barrier Operations Percentage │ ❌ │ % │
300+ │ Branch Operations Percentage │ ❌ │ % │
301+ │ Crypto Operations Percentage │ 0.00 │ % │
302+ │ Integer Operations Percentage │ 33.52 │ % │
303+ │ Load Operations Percentage │ 16.69 │ % │
304+ │ Floating Point Operations Percentage │ 16.51 │ % │
305+ │ Advanced SIMD Operations Percentage │ 0.00 │ % │
306+ │ Store Operations Percentage │ 16.58 │ % │
307+ │ SVE Operations (Load/Store Inclusive) Percentage │ 0.00 │ % │
308+ └──────────────────────────────────────────────────┴───────┴──────┘
261309```
262310
263311
0 commit comments