Skip to content

Commit 75c98ee

Browse files
authored
Update 08-analyze-etdump.md
1 parent 379b062 commit 75c98ee

File tree

1 file changed

+13
-2
lines changed
  • content/learning-paths/mobile-graphics-and-gaming/measure-kleidiai-kernel-performance-on-executorch

1 file changed

+13
-2
lines changed

content/learning-paths/mobile-graphics-and-gaming/measure-kleidiai-kernel-performance-on-executorch/08-analyze-etdump.md

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,13 @@ weight: 9
66
layout: learningpathall
77
---
88

9-
In the final step, we create an Inspector instance by providing the paths to the generated ETDump and ETRecord.
9+
You will use the ExecuTorch Inspector to correlate runtime events from the .etdump with the lowered graph and backend mapping from the .etrecord. This lets you confirm that a node was delegated to XNNPACK and when eligible it was accelerated by KleidiAI micro-kernels.
10+
1011
The Inspector analyzes the runtime data from the ETDump file and maps it to the corresponding operators in the Edge Dialect Graph.
1112

13+
### Inspector script
1214

13-
To visualize all runtime events in a tabular format, simply call:
15+
Save the following code in a file named `inspect.py` and run it with the path to a .pte model. The script auto-derives .etrecord, .etdump, and an output .csv next to it.
1416

1517
```python
1618

@@ -38,6 +40,14 @@ with open(csvfile, "w", encoding="utf-8") as f:
3840

3941
```
4042

43+
### Run the script
44+
45+
Run the script, for example with the linear_model_pf32_gemm.pte model :
46+
47+
```bash
48+
python3 inspect.py model/linear_model_pf32_gemm.pte
49+
```
50+
4151
Next, you can examine the generated CSV file to view the execution time information for each node in the model.
4252

4353
Below is an example showing the runtime data corresponding to the Fully Connected node.
@@ -51,5 +61,6 @@ Below is an example showing the runtime data corresponding to the Fully Connecte
5161
| Execute | DELEGATE_CALL | 0.04136 | 0.04464 | 0.04792 | 0.046082053 | 0.03372 | 4.390585 | ['aten.linear.default'] | FALSE | XnnpackBackend |
5262
| Execute | Method::execute | 0.04848 | 0.0525595 | 0.05756 | 0.0540658046 | 0.03944 | 4.404385 | [] | FALSE | |
5363

64+
You can now iterate over FP32 vs FP16 vs INT8 vs INT4 models, confirm the exact GEMM variant used, and quantify the latency savings attributable to KleidiAI micro-kernels on your Arm device.
5465

5566
You can experiment with different models and matrix sizes to obtain various performance results.

0 commit comments

Comments
 (0)