1- ## ExecuTorch on ARM Cortex-M55 + Ethos-U55
1+ ## ExecuTorch for Arm backends Ethos-U, VGF and Cortex-M
22
3- This dir contains scripts to help you prepare setup needed to run a PyTorch
4- model on an ARM Corstone-300 platform via ExecuTorch. Corstone-300 platform
5- contains the Cortex-M55 CPU and Ethos-U55 NPU.
3+ This project contains scripts to help you setup and run a PyTorch
4+ model on a Arm backend via ExecuTorch. This backend supports Ethos-U and VGF as
5+ targets (using TOSA) but you can also use the Ethos-U example runner as an example
6+ on Cortex-M if you do not delegate the model.
7+
8+ The main scripts are ` setup.sh ` , ` run.sh ` and ` aot_arm_compiler.py ` .
9+
10+ ` setup.sh ` will install the needed tools and with --root-dir <FOLDER >
11+ you can change the path to a scratch folder where it will download and generate build
12+ artifacts. If supplied, you must also supply the same folder to run.sh with
13+ --scratch-dir=<FOLDER > If not supplied both script will use examples/arm/ethos-u-scratch
14+
15+ ` run.sh ` can be used to build, run and test a model in an easy way and it will call cmake for you
16+ and in cases you want to run a simulator it will start it also. The script will call ` aot_arm_compiler.py `
17+ to convert a model and include it in the build/run.
18+
19+ Build and test artifacts are by default placed under the folder arm_test folder
20+ this can be changed with --et_build_root=<FOLDER >
21+
22+ ` aot_arm_compiler.py ` is used to convert a Python model or a saved .pt model to a PTE file and is used by ` run.sh `
23+ and other test script but can also be used directly.
24+
25+ If you prefer to use the ExecuTorch API, there is also the ` ethos_u_minimal_example.ipynb ` notebook example.
26+ This shows the workflow if you prefer to integrate a python torch.export and ExecuTorch flow directly into your
27+ model codebase. This is particularly useful if you want to perform more complex training, such as quantization
28+ aware training using the ArmQuantizer.
29+
30+ ## Create a PTE file for Arm backends
31+
32+ There is an easy to use example flow to compile your PyTorch model to a PTE file for the Arm backend called ` aot_arm_compiler.py `
33+ that you can use to generate PTE files, it can generate PTE files for the supported targets ` -t ` or even non delegated (Cortex-M)
34+ using different memory modes and can both use a python file as input or just use the models from examples/models with ` --model_input ` .
35+ It also supports generating Devtools artifacts like BundleIO BPTE files, and ETRecords. Run it with ` --help ` to check its capabilities.
36+
37+ You point out the model to convert with ` --model_name=<MODELNAME/FILE> ` It supports running a model from examples/models or models
38+ from a python file if you just specify ` ModelUnderTest ` and ` ModelInput ` in it.
39+
40+ ```
41+ $ python3 -m examples.arm.aot_arm_compiler --help
42+ ```
43+
44+ This is how you generate a BundleIO BPTE of a simple add example
45+
46+ ```
47+ $ python3 -m examples.arm.aot_arm_compiler --model_name=examples/arm/example_modules/add.py --target=ethos-u55-128 --bundleio
48+ ```
49+
50+ The example model used has added two extra variables that is picked up to make this work.
51+
52+ ` ModelUnderTest ` should be a ` torch.nn.module ` instance.
53+
54+ ` ModelInputs ` should be a tuple of inputs to the forward function.
55+
56+
57+ You can also use the models from example/models directly by just using the short name e.g.
58+
59+ ```
60+ $ python3 -m examples.arm.aot_arm_compiler --model_name=mv2 --target=ethos-u55-64
61+ ```
62+
63+
64+ The ` aot_arm_compiler.py ` is called from the scripts below so you don't need to, but it can be useful to do by hand in some cases.
65+
66+
67+ ## ExecuTorch on Arm Ethos-U55/U65 and U85
68+
69+ This example code will help you get going with the Corstone&trade ; -300/320 platforms and
70+ run on the FVP and can be used a a starting guide in your porting to your board/HW
671
772We will start from a PyTorch model in python, export it, convert it to a ` .pte `
873file - A binary format adopted by ExecuTorch. Then we will take the ` .pte `
974model file and embed that with a baremetal application executor_runner. We will
1075then take the executor_runner file, which contains not only the ` .pte ` binary but
1176also necessary software components to run standalone on a baremetal system.
12- Lastly, we will run the executor_runner binary on a Corstone-300 FVP Simulator
13- platform.
77+ The build flow will pick up the non delegated ops from the generated PTE file and
78+ add CPU implementation of them.
79+ Lastly, we will run the executor_runner binary on a Corstone&trade ; -300/320 FVP Simulator platform.
80+
1481
1582### Example workflow
1683
17- There are two main scripts, setup.sh and run.sh. Each takes one optional,
18- positional argument. It is a path to a scratch dir to download and generate
19- build artifacts. If supplied, the same argument must be supplied to both the scripts.
84+ Below is example workflow to build an application for Ethos-U55/85. The script below requires an internet connection:
2085
21- To run these scripts. On a Linux system, in a terminal, with a working internet connection,
2286```
2387# Step [1] - setup necessary tools
2488$ cd <EXECUTORCH-ROOT-FOLDER>
25- $ executorch/examples/arm/setup.sh --i-agree-to-the-contained-eula [optional-scratch-dir]
89+ $ ./examples/arm/setup.sh --i-agree-to-the-contained-eula
90+
91+ # Step [2] - Setup path to tools, The `setup.sh` script has generated a script that you need to source every time you restart you shell.
92+ $ source examples/arm/ethos-u-scratch/setup_path.sh
93+
94+ # Step [3] - build and run ExecuTorch and executor_runner baremetal example application
95+ # on a Corstone(TM)-320 FVP to run a simple PyTorch model from a file.
96+ $ ./examples/arm/run.sh --model_name=examples/arm/example_modules/add.py --target=ethos-u85-128
97+ ```
98+
99+ The argument ` --model_name=<MODEL> ` is passed to ` aot_arm_compiler.py ` so you can use it in the same way
100+ e.g. you can also use the models from example/models directly in the same way as above.
101+
102+ ```
103+ $ ./examples/arm/run.sh --model_name=mv2 --target=ethos-u55-64
104+ ```
105+
106+ The runner will by default set all inputs to "1" and you are supposed to add/change the code
107+ handling the input for your hardware target to give the model proper input, maybe from your camera
108+ or mic hardware.
109+
110+ While testing you can use the --bundleio flag to use the input from the python model file and
111+ generate a .bpte instead of a .pte file. This will embed the input example data and reference output
112+ in the bpte file/data, which is used to verify the model's output. You can also use --etdump to generate
113+ an ETRecord and a ETDump trace files from your target (they are printed as base64 strings in the serial log).
26114
27- # Step [2] - Setup Patch to tools, The `setup.sh` script has generated a script that you need to source everytime you restart you shell.
28- $ source executorch/examples/arm/ethos-u-scratch/setup_path.sh
115+ Just keep in mind that CPU cycles are NOT accurate on the FVP simulator and it can not be used for
116+ performance measurements, so you need to run on FPGA or actual ASIC to get good results from --etdump.
117+ As a note the printed NPU cycle numbers are still usable and closer to real values if the timing
118+ adaptor is setup correctly.
29119
30- # Step [3] - build + run ExecuTorch and executor_runner baremetal application
31- # suited for Corstone FVP's to run a simple PyTorch model.
32- $ executorch/examples/arm/run.sh --model_name=mv2 --target=ethos-u85-128 [--scratch-dir=same-optional-scratch-dir-as-before]
33120```
121+ # Build + run with BundleIO and ETDump
122+ $ ./examples/arm/run.sh --model_name=lstm --target=ethos-u85-128 --bundleio --etdump
123+ ```
124+
34125
35126### Ethos-U minimal example
36127
@@ -42,6 +133,19 @@ pip install jupyter
42133jupyter notebook ethos_u_minimal_example.ipynb
43134```
44135
136+ ## ExecuTorch on ARM Cortex-M
137+
138+ For Cortex-M you run the script without delegating e.g ` --no_delegate ` as the build flow already supports picking up
139+ the non delegated ops from the generated PTE file and add CPU implementation of them this will work out of the box in
140+ most cases.
141+
142+ To run mobilenet_v2 on the Cortex-M55 only, without using the Ethos-U try this:
143+
144+ ```
145+ $ ./examples/arm/run.sh --model_name=mv2 --target=ethos-u55-128 --no_delegate
146+ ```
147+
148+
45149### Online Tutorial
46150
47151We also have a [ tutorial] ( https://pytorch.org/executorch/main/backends-arm-ethos-u ) explaining the steps performed in these
0 commit comments