You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/learning-paths/mobile-graphics-and-gaming/measure-kleidiai-kernel-performance-on-executorch/08-analyze-etdump.md
+13-2Lines changed: 13 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,13 @@ weight: 9
6
6
layout: learningpathall
7
7
---
8
8
9
-
In the final step, we create an Inspector instance by providing the paths to the generated ETDump and ETRecord.
9
+
You will use the ExecuTorch Inspector to correlate runtime events from the .etdump with the lowered graph and backend mapping from the .etrecord. This lets you confirm that a node was delegated to XNNPACK and when eligible it was accelerated by KleidiAI micro-kernels.
10
+
10
11
The Inspector analyzes the runtime data from the ETDump file and maps it to the corresponding operators in the Edge Dialect Graph.
11
12
13
+
### Inspector script
12
14
13
-
To visualize all runtime events in a tabular format, simply call:
15
+
Save the following code in a file named `inspect.py` and run it with the path to a .pte model. The script auto-derives .etrecord, .etdump, and an output .csv next to it.
14
16
15
17
```python
16
18
@@ -38,6 +40,14 @@ with open(csvfile, "w", encoding="utf-8") as f:
38
40
39
41
```
40
42
43
+
### Run the script
44
+
45
+
Run the script, for example with the linear_model_pf32_gemm.pte model :
You can now iterate over FP32 vs FP16 vs INT8 vs INT4 models, confirm the exact GEMM variant used, and quantify the latency savings attributable to KleidiAI micro-kernels on your Arm device.
54
65
55
66
You can experiment with different models and matrix sizes to obtain various performance results.
0 commit comments