|
18 | 18 | "source": [
|
19 | 19 | "# Ethos-U delegate flow example\n",
|
20 | 20 | "\n",
|
21 |
| - "This guide demonstrates the full flow for running a module on Arm Ethos-U using ExecuTorch. \n", |
| 21 | + "This guide demonstrates the full flow for running a module on Arm Ethos-U55 using ExecuTorch.\n", |
22 | 22 | "Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n",
|
23 | 23 | "\n",
|
24 | 24 | "Before you begin:\n",
|
25 | 25 | "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n",
|
26 | 26 | "2. Install Arm cross-compilation toolchain and simulators using `./examples/arm/setup.sh --i-agree-to-the-contained-eula`\n",
|
27 |
| - "3. Add Arm cross-compilation toolchain and simulators to PATH using `./examples/arm/ethos-u-scratch/setup_path.sh` \n", |
28 | 27 | "\n",
|
29 | 28 | "With all commands executed from the base `executorch` folder.\n",
|
30 | 29 | "\n",
|
|
70 | 69 | "source": [
|
71 | 70 | "To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n",
|
72 | 71 | "\n",
|
73 |
| - "Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters.", |
| 72 | + "Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters.\n", |
74 | 73 | "\n",
|
75 | 74 | "With the default passes for the Arm Ethos-U backend, assuming the model lowers fully to the Ethos-U, the exported program is composed of a Quantize node, Ethos-U custom delegate and a Dequantize node. In some circumstances, you may want to feed quantized input to the Neural Network straight away, e.g. if you have a camera sensor outputting (u)int8 data and keep all the arithmetic of the application in the int8 domain. For these cases, you can apply the `exir/passes/quantize_io_pass.py`. See the unit test in `backends/arm/test/passes/test_ioquantization_pass.py`for an example how to feed quantized inputs and obtain quantized outputs.\n"
|
76 | 75 | ]
|
|
88 | 87 | ")\n",
|
89 | 88 | "from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e\n",
|
90 | 89 | "\n",
|
91 |
| - "target = \"ethos-u55-128\"\n", |
92 |
| - "\n", |
93 | 90 | "# Create a compilation spec describing the target for configuring the quantizer\n",
|
94 | 91 | "# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an\n",
|
95 | 92 | "# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n",
|
96 | 93 | "spec_builder = ArmCompileSpecBuilder().ethosu_compile_spec(\n",
|
97 |
| - " target,\n", |
| 94 | + " target=\"ethos-u55-128\",\n", |
98 | 95 | " system_config=\"Ethos_U55_High_End_Embedded\",\n",
|
99 | 96 | " memory_mode=\"Shared_Sram\",\n",
|
100 | 97 | " extra_flags=\"--output-format=raw --debug-force-regor\"\n",
|
|
139 | 136 | "metadata": {},
|
140 | 137 | "outputs": [],
|
141 | 138 | "source": [
|
142 |
| - "import os\n", |
143 | 139 | "from executorch.backends.arm.ethosu import EthosUPartitioner\n",
|
144 | 140 | "from executorch.exir import (\n",
|
145 | 141 | " EdgeCompileConfig,\n",
|
|
165 | 161 | " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n",
|
166 | 162 | " )\n",
|
167 | 163 | "\n",
|
168 |
| - "executorch_program_manager.exported_program().module().print_readable()\n", |
| 164 | + "_ = executorch_program_manager.exported_program().module().print_readable()\n", |
169 | 165 | "\n",
|
170 | 166 | "# Save pte file\n",
|
171 |
| - "cwd_dir = os.getcwd()\n", |
172 |
| - "pte_base_name = \"simple_example\"\n", |
173 |
| - "pte_name = pte_base_name + \".pte\"\n", |
174 |
| - "pte_path = os.path.join(cwd_dir, pte_name)\n", |
175 |
| - "save_pte_program(executorch_program_manager, pte_name)\n", |
176 |
| - "assert os.path.exists(pte_path), \"Build failed; no .pte-file found\"" |
| 167 | + "save_pte_program(executorch_program_manager, \"ethos_u_minimal_example.pte\")" |
177 | 168 | ]
|
178 | 169 | },
|
179 | 170 | {
|
|
183 | 174 | "## Build executor runtime\n",
|
184 | 175 | "\n",
|
185 | 176 | "After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in two steps:\n",
|
186 |
| - "1. Build and install the executorch library and EthosUDelegate.\n", |
| 177 | + "1. Build and install the executorch libraries and EthosUDelegate.\n", |
187 | 178 | "2. Build and link the `arm_executor_runner` and generate kernel bindings for any non delegated ops."
|
188 | 179 | ]
|
189 | 180 | },
|
|
193 | 184 | "metadata": {},
|
194 | 185 | "outputs": [],
|
195 | 186 | "source": [
|
196 |
| - "import subprocess\n", |
197 |
| - "\n", |
198 |
| - "# Setup paths\n", |
199 |
| - "et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n", |
200 |
| - "et_dir = os.path.abspath(et_dir)\n", |
201 |
| - "script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n", |
202 |
| - "\n", |
203 |
| - "# Cross-compile executorch \n", |
204 |
| - "subprocess.run(os.path.join(script_dir, \"build_executorch.sh\"), shell=True, cwd=et_dir)\n", |
205 |
| - "\n", |
206 |
| - "# Cross-compile executorch runner\n", |
207 |
| - "args = f\"--pte={pte_path} --target={target}\"\n", |
208 |
| - "subprocess.run(os.path.join(script_dir, \"build_executor_runner.sh\") + \" \" + args, shell=True, cwd=et_dir)\n", |
209 |
| - "\n", |
210 |
| - "elf_path = os.path.join(cwd_dir, pte_base_name, \"cmake-out\", \"arm_executor_runner\")\n", |
211 |
| - "assert os.path.exists(elf_path), \"Build failed; no .elf-file found\"" |
| 187 | + "%%bash\n", |
| 188 | + "# Ensure the arm-none-eabi-gcc toolchain and FVP:s are available on $PATH\n", |
| 189 | + "source ethos-u-scratch/setup_path.sh\n", |
| 190 | + "\n", |
| 191 | + "# Build executorch libraries cross-compiled for arm baremetal to executorch/cmake-out-arm\n", |
| 192 | + "cmake --preset arm-baremetal \\\n", |
| 193 | + "-DCMAKE_BUILD_TYPE=Release \\\n", |
| 194 | + "-B../../cmake-out-arm ../..\n", |
| 195 | + "cmake --build ../../cmake-out-arm --target install -j$(nproc) " |
| 196 | + ] |
| 197 | + }, |
| 198 | + { |
| 199 | + "cell_type": "code", |
| 200 | + "execution_count": null, |
| 201 | + "metadata": {}, |
| 202 | + "outputs": [], |
| 203 | + "source": [ |
| 204 | + "%%bash \n", |
| 205 | + "source ethos-u-scratch/setup_path.sh\n", |
| 206 | + "\n", |
| 207 | + "# Build example executor runner application to examples/arm/ethos_u_minimal_example\n", |
| 208 | + "cmake -DCMAKE_TOOLCHAIN_FILE=$(pwd)/ethos-u-setup/arm-none-eabi-gcc.cmake \\\n", |
| 209 | + " -DCMAKE_BUILD_TYPE=Release \\\n", |
| 210 | + " -DET_PTE_FILE_PATH=ethos_u_minimal_example.pte \\\n", |
| 211 | + " -DTARGET_CPU=cortex-m55 \\\n", |
| 212 | + " -DETHOSU_TARGET_NPU_CONFIG=ethos-u55-128 \\\n", |
| 213 | + " -DMEMORY_MODE=Shared_Sram \\\n", |
| 214 | + " -DSYSTEM_CONFIG=Ethos_U55_High_End_Embedded \\\n", |
| 215 | + " -Bethos_u_minimal_example \\\n", |
| 216 | + " executor_runner\n", |
| 217 | + "cmake --build ethos_u_minimal_example -j$(nproc) -- arm_executor_runner" |
212 | 218 | ]
|
213 | 219 | },
|
214 | 220 | {
|
|
217 | 223 | "source": [
|
218 | 224 | "# Run on simulated model\n",
|
219 | 225 | "\n",
|
220 |
| - "We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2." |
| 226 | + "We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. The example application is by default built with an input of ones, so the expected result of the quantized addition should be close to 2." |
221 | 227 | ]
|
222 | 228 | },
|
223 | 229 | {
|
|
226 | 232 | "metadata": {},
|
227 | 233 | "outputs": [],
|
228 | 234 | "source": [
|
229 |
| - "args = f\"--elf={elf_path} --target={target}\"\n", |
230 |
| - "subprocess.run(os.path.join(script_dir, \"run_fvp.sh\") + \" \" + args, shell=True, cwd=et_dir)" |
| 235 | + "%%bash \n", |
| 236 | + "source ethos-u-scratch/setup_path.sh\n", |
| 237 | + "\n", |
| 238 | + "# Run the example\n", |
| 239 | + "../../backends/arm/scripts/run_fvp.sh --elf=ethos_u_minimal_example/arm_executor_runner --target=ethos-u55-128" |
231 | 240 | ]
|
232 | 241 | }
|
233 | 242 | ],
|
|
0 commit comments