Skip to content

Commit 125d49b

Browse files
authored
Merge pull request #1533 from BmanClark/main
Adding ExecuTorch profiling instructions
2 parents 900701d + 41ea4f4 commit 125d49b

File tree

2 files changed

+93
-2
lines changed

2 files changed

+93
-2
lines changed

content/learning-paths/mobile-graphics-and-gaming/profiling-ml-on-arm/app-profiling-streamline.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -128,8 +128,8 @@ Now add the code below to the `build.gradle` file of the Module you wish to prof
128128
```gradle
129129
externalNativeBuild {
130130
cmake {
131-
path file('src/main/cpp/CMakeLists.txt')
132-
version '3.22.1'
131+
path = file("src/main/cpp/CMakeLists.txt")
132+
version = "3.22.1"
133133
}
134134
}
135135
```
Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
---
2+
title: ML Profiling of an ExecuTorch model
3+
weight: 7
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## ExecuTorch Profiling Tools
10+
[ExecuTorch](https://pytorch.org/executorch/stable/index.html) can be used for running PyTorch models on constrained devices like mobile. As so many models are developed in PyTorch, this is a useful way to quickly deploy them to mobile devices, without needing conversion tools like Google's [ai-edge-torch](https://github.com/google-ai-edge/ai-edge-torch) to turn them into tflite.
11+
12+
To get started on ExecuTorch, you can follow the instructions on the [PyTorch website](https://pytorch.org/executorch/stable/getting-started-setup). Further, to then deploy on Android, the instructions are [here](https://pytorch.org/executorch/stable/demo-apps-android.html). If you haven't already got ExecuTorch running on Android, you should follow these instructions first.
13+
14+
ExecuTorch comes with a set of profiling tools, but currently they are aimed at Linux, not Android where you will want to deploy. The instructions to profile on Linux are [here](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial.html), but we will look at how to adapt them for Android.
15+
16+
## Profiling on Android
17+
18+
To profile on Android, the steps are the same as [Linux](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial.html), except that we need to generate the ETDump file on an Android device.
19+
20+
To start with, generate the ETRecord exactly as per the Linux instructions.
21+
22+
Next, follow the instructions to create the ExecuTorch bundled program that you'll need to generate the ETDump. You'll copy this to your Android device together with the runner program you're about to compile.
23+
24+
To compile the runner program you'll need to adapt the `build_example_runner.sh` script in the instructions (located in the `examples/devtools` subfolder of the ExecuTorch repository) to compile it for Android. Copy the script and rename the copy to `build_android_example_runner.sh`, ready for editing. Remove all lines with `coreml` in them, and the options dependent on it, as these are not needed for Android.
25+
26+
You'll need to set the `ANDROID_NDK` environment variable to point to your Android NDK installation. At the top of the `main()` function add:
27+
28+
```bash
29+
export ANDROID_NDK=~/Android/Sdk/ndk/28.0.12674087 # replace this with the correct path for your NDK installation
30+
export ANDROID_ABI=arm64-v8a
31+
```
32+
33+
Next add Android options to the first `cmake` configuration line in `main()`, that configures the building of the ExecuTorch library. Change it to:
34+
35+
```bash
36+
cmake -DCMAKE_INSTALL_PREFIX=cmake-out \
37+
-DCMAKE_BUILD_TYPE=Release \
38+
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
39+
-DANDROID_ABI="${ANDROID_ABI}" \
40+
-DEXECUTORCH_BUILD_XNNPACK=ON \
41+
-DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
42+
-DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
43+
-DEXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL=ON \
44+
-DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
45+
-DEXECUTORCH_BUILD_DEVTOOLS=ON \
46+
-DEXECUTORCH_ENABLE_EVENT_TRACER=ON \
47+
-Bcmake-out .
48+
```
49+
50+
The `cmake` build step for the ExecuTorch library stays the same, as do the next lines setting up local variables.
51+
52+
Next we need to adapt the options to Android in the second `cmake` configuration line, that configures the building of the runner. This now becomes:
53+
54+
```bash
55+
cmake -DCMAKE_PREFIX_PATH="${cmake_prefix_path}" \
56+
-Dexecutorch_DIR="${PWD}/cmake-out/lib/cmake/ExecuTorch" -Dgflags_DIR="${PWD}/cmake-out/third-party/gflags" \
57+
-DCMAKE_BUILD_TYPE=Release \
58+
-DCMAKE_TOOLCHAIN_FILE="${ANDROID_NDK}/build/cmake/android.toolchain.cmake" \
59+
-DANDROID_ABI="${ANDROID_ABI}" \
60+
-B"${build_dir}" \
61+
"${example_dir}"
62+
```
63+
64+
Once the configuration lines are changed, you can now run the script `./build_android_example_runner.sh` to build the runner program. Once compiled you can find the executable `example_runner` in `cmake-out/examples/devtools/`.
65+
66+
Copy `example_runner` and the ExecuTorch bundled program to your Android device. Do this with adb:
67+
68+
```bash
69+
adb push example_runner /data/local/tmp/
70+
adb push bundled_program.bp /data/local/tmp/
71+
adb shell
72+
chmod 777 /data/local/tmp/example_runner
73+
./example_runner --bundled_program_path="bundled_program.bp"
74+
exit
75+
adb pull /data/local/tmp/etdump.etdp .
76+
```
77+
78+
You now have the ETDump file ready to analyse with an ExecuTorch Inspector, as per the Linux instructions.
79+
80+
To get a full display of the operators and their timings you can just do:
81+
82+
```python
83+
from executorch.devtools import Inspector
84+
85+
etrecord_path = "etrecord.bin"
86+
etdump_path = "etdump.etdp"
87+
inspector = Inspector(etdump_path=etdump_path, etrecord=etrecord_path)
88+
inspector.print_data_tabular()
89+
```
90+
91+
However, as the [ExecuTorch profiling page](https://pytorch.org/executorch/main/tutorials/devtools-integration-tutorial.html) explains, there are data analysis options available. These enable you to quickly find the slowest layer, group operators etc. Both the `EventBlock` and `DataFrame` approaches work well. However, at time of writing, the `find_total_for_module()` function has a [bug](https://github.com/pytorch/executorch/issues/7200) and returns incorrect values - hopefully this will soon be fixed.

0 commit comments

Comments
 (0)