From 2a38798a1c8276eff1f479e9e6a0fd1813136bc9 Mon Sep 17 00:00:00 2001 From: Adrian Lundell Date: Mon, 17 Feb 2025 13:55:58 +0100 Subject: [PATCH] Arm backend: Add ethos_u_minimal_example jupyter notebook This example provides a good base understanding of all steps needed to lower and run a Pytorch module through the EthousUBackend. It also works as a starting point for creating scripts for lowering a custom network. Change-Id: I1a70dec8f81dd7be6c78c5c3a8b62cd1b12ac9c1 --- examples/arm/README.md | 10 + examples/arm/ethos_u_minimal_example.ipynb | 284 +++++++++++++++++++++ 2 files changed, 294 insertions(+) create mode 100644 examples/arm/ethos_u_minimal_example.ipynb diff --git a/examples/arm/README.md b/examples/arm/README.md index c74eda7ae2b..8762e7ccdd1 100644 --- a/examples/arm/README.md +++ b/examples/arm/README.md @@ -32,6 +32,16 @@ $ source executorch/examples/arm/ethos-u-scratch/setup_path.sh $ executorch/examples/arm/run.sh --model_name=mv2 --target=ethos-u85-128 [--scratch-dir=same-optional-scratch-dir-as-before] ``` +### Ethos-U minimal example + +See the jupyter notebook `ethos_u_minimal_example.ipynb` for an explained minimal example of the full flow for running a +PyTorch module on the EthosUDelegate. The notebook runs directly in some IDE:s s.a. VS Code, otherwise it can be run in +your browser using +``` +pip install jupyter +jupyter notebook ethos_u_minimal_example.ipynb +``` + ### Online Tutorial We also have a [tutorial](https://pytorch.org/executorch/stable/executorch-arm-delegate-tutorial.html) explaining the steps performed in these diff --git a/examples/arm/ethos_u_minimal_example.ipynb b/examples/arm/ethos_u_minimal_example.ipynb new file mode 100644 index 00000000000..d73695e9d48 --- /dev/null +++ b/examples/arm/ethos_u_minimal_example.ipynb @@ -0,0 +1,284 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Copyright 2025 Arm Limited and/or its affiliates.\n", + "#\n", + "# This source code is licensed under the BSD-style license found in the\n", + "# LICENSE file in the root directory of this source tree." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Ethos-U delegate flow example\n", + "\n", + "This guide demonstrates the full flow for running a module on Arm Ethos-U using ExecuTorch. \n", + "Tested on Linux x86_64 and macOS aarch64. If something is not working for you, please raise a GitHub issue and tag Arm.\n", + "\n", + "Before you begin:\n", + "1. (In a clean virtual environment with a compatible Python version) Install executorch using `./install_executorch.sh`\n", + "2. Install Arm cross-compilation toolchain and simulators using `examples/arm/setup.sh --i-agree-to-the-contained-eula`\n", + "3. Add Arm cross-compilation toolchain and simulators to PATH using `examples/arm/ethos-u-scratch/setup_path.sh` \n", + "\n", + "With all commands executed from the base `executorch` folder.\n", + "\n", + "\n", + "\n", + "*Some scripts in this notebook produces long output logs: Configuring the 'Customizing Notebook Layout' settings to enable 'Output:scrolling' and setting 'Output:Text Line Limit' makes this more manageable*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## AOT Flow\n", + "\n", + "The first step is creating the PyTorch module and exporting it. Exporting converts the python code in the module into a graph structure. The result is still runnable python code, which can be displayed by printing the `graph_module` of the exported program. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import torch\n", + "\n", + "class Add(torch.nn.Module):\n", + " def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:\n", + " return x + y\n", + "\n", + "example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))\n", + "\n", + "model = Add()\n", + "model = model.eval()\n", + "exported_program = torch.export.export_for_training(model, example_inputs)\n", + "graph_module = exported_program.module()\n", + "\n", + "_ = graph_module.print_readable()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To run on Ethos-U the `graph_module` must be quantized using the `arm_quantizer`. Quantization can be done in multiple ways and it can be customized for different parts of the graph; shown here is the recommended path for the EthosUBackend. Quantization also requires calibrating the module with example inputs.\n", + "\n", + "Again printing the module, it can be seen that the quantization wraps the node in quantization/dequantization nodes which contain the computed quanitzation parameters." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.arm_backend import ArmCompileSpecBuilder\n", + "from executorch.backends.arm.quantizer.arm_quantizer import (\n", + " EthosUQuantizer,\n", + " get_symmetric_quantization_config,\n", + ")\n", + "from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e\n", + "\n", + "target = \"ethos-u55-128\"\n", + "\n", + "# Create a compilation spec describing the target for configuring the quantizer\n", + "# Some args are used by the Arm Vela graph compiler later in the example. Refer to Arm Vela documentation for an \n", + "# explanation of its flags: https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md\n", + "spec_builder = ArmCompileSpecBuilder().ethosu_compile_spec(\n", + " target,\n", + " system_config=\"Ethos_U55_High_End_Embedded\",\n", + " memory_mode=\"Shared_Sram\",\n", + " extra_flags=\"--output-format=raw --debug-force-regor\"\n", + " )\n", + "compile_spec = spec_builder.build()\n", + "\n", + "# Create and configure quantizer to use a symmetric quantization config globally on all nodes\n", + "quantizer = EthosUQuantizer(compile_spec) \n", + "operator_config = get_symmetric_quantization_config(is_per_channel=False)\n", + "quantizer.set_global(operator_config)\n", + "\n", + "# Post training quantization\n", + "quantized_graph_module = prepare_pt2e(graph_module, quantizer) \n", + "quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input\n", + "quantized_graph_module = convert_pt2e(quantized_graph_module)\n", + "\n", + "_ = quantized_graph_module.print_readable()\n", + "\n", + "# Create a new exported program using the quantized_graph_module\n", + "quantized_exported_program = torch.export.export_for_training(quantized_graph_module, example_inputs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The quantization nodes created in the previous cell are not built by default with ExecuTorch but must be included in the .pte-file, and so they need to be built separately. `backends/arm/scripts/build_quantized_ops_aot_lib.sh` is a utility script which does this. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import subprocess \n", + "import os \n", + "\n", + "# Setup paths\n", + "cwd_dir = os.getcwd()\n", + "et_dir = os.path.join(cwd_dir, \"..\", \"..\")\n", + "et_dir = os.path.abspath(et_dir)\n", + "script_dir = os.path.join(et_dir, \"backends\", \"arm\", \"scripts\")\n", + "\n", + "# Run build_quantized_ops_aot_lib.sh\n", + "subprocess.run(os.path.join(script_dir, \"build_quantized_ops_aot_lib.sh\"), shell=True, cwd=et_dir)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The lowering in the EthosUBackend happens in five steps:\n", + "\n", + "1. **Lowering to core Aten operator set**: Transform module to use a subset of operators applicable to edge devices. \n", + "2. **Partitioning**: Find subgraphs which are supported for running on Ethos-U\n", + "3. **Lowering to TOSA compatible operator set**: Perform transforms to make the Ethos-U subgraph(s) compatible with TOSA \n", + "4. **Serialization to TOSA**: Compiles the graph module into a TOSA graph \n", + "5. **Compilation to NPU**: Compiles the TOSA graph into an EthosU command stream using the Arm Vela graph compiler. This makes use of the `compile_spec` created earlier.\n", + "Step 5 also prints a Network summary for each processed subgraph.\n", + "\n", + "All of this happens behind the scenes in `to_edge_transform_and_lower`. Printing the graph module shows that what is left in the graph is two quantization nodes for `x` and `y` going into an `executorch_call_delegate` node, followed by a dequantization node." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from executorch.backends.arm.ethosu_partitioner import EthosUPartitioner\n", + "from executorch.exir import (\n", + " EdgeCompileConfig,\n", + " ExecutorchBackendConfig,\n", + " to_edge_transform_and_lower,\n", + ")\n", + "from executorch.extension.export_util.utils import save_pte_program\n", + "import platform \n", + "\n", + "# Create partitioner from compile spec \n", + "partitioner = EthosUPartitioner(compile_spec)\n", + "\n", + "# Lower the exported program to the Ethos-U backend\n", + "edge_program_manager = to_edge_transform_and_lower(\n", + " quantized_exported_program,\n", + " partitioner=[partitioner],\n", + " compile_config=EdgeCompileConfig(\n", + " _check_ir_validity=False,\n", + " ),\n", + " )\n", + "\n", + "# Load quantization ops library\n", + "os_aot_lib_names = {\"Darwin\" : \"libquantized_ops_aot_lib.dylib\", \n", + " \"Linux\" : \"libquantized_ops_aot_lib.so\", \n", + " \"Windows\": \"libquantized_ops_aot_lib.dll\"}\n", + "aot_lib_name = os_aot_lib_names[platform.system()]\n", + "\n", + "libquantized_ops_aot_lib_path = os.path.join(et_dir, \"cmake-out-aot-lib\", \"kernels\", \"quantized\", aot_lib_name)\n", + "torch.ops.load_library(libquantized_ops_aot_lib_path)\n", + "\n", + "# Convert edge program to executorch\n", + "executorch_program_manager = edge_program_manager.to_executorch(\n", + " config=ExecutorchBackendConfig(extract_delegate_segments=False)\n", + " )\n", + "\n", + "executorch_program_manager.exported_program().module().print_readable()\n", + "\n", + "# Save pte file\n", + "pte_base_name = \"simple_example\"\n", + "pte_name = pte_base_name + \".pte\"\n", + "pte_path = os.path.join(cwd_dir, pte_name)\n", + "save_pte_program(executorch_program_manager, pte_name)\n", + "assert os.path.exists(pte_path), \"Build failed; no .pte-file found\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Build executor runtime\n", + "\n", + "After the AOT compilation flow is done, the runtime can be cross compiled and linked to the produced .pte-file using the Arm cross-compilation toolchain. This is done in three steps:\n", + "1. Build the executorch library and EthosUDelegate.\n", + "2. Build any external kernels required. In this example this is not needed as the graph is fully delegated, but its included for completeness.\n", + "3. Build and link the `arm_executor_runner`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Build executorch \n", + "subprocess.run(os.path.join(script_dir, \"build_executorch.sh\"), shell=True, cwd=et_dir)\n", + "\n", + "# Build portable kernels\n", + "subprocess.run(os.path.join(script_dir, \"build_portable_kernels.sh\"), shell=True, cwd=et_dir)\n", + "\n", + "# Build executorch runner\n", + "args = f\"--pte={pte_path} --target={target}\"\n", + "subprocess.run(os.path.join(script_dir, \"build_executorch_runner.sh\") + \" \" + args, shell=True, cwd=et_dir)\n", + "\n", + "elf_path = os.path.join(cwd_dir, pte_base_name, \"cmake-out\", \"arm_executor_runner\")\n", + "assert os.path.exists(elf_path), \"Build failed; no .elf-file found\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run on simulated model\n", + "\n", + "We can finally use the `backends/arm/scripts/run_fvp.sh` utility script to run the .elf-file on simulated Arm hardware. This Script runs the model with an input of ones, so the expected result of the addition should be close to 2." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "args = f\"--elf={elf_path} --target={target}\"\n", + "subprocess.run(os.path.join(script_dir, \"run_fvp.sh\") + \" \" + args, shell=True, cwd=et_dir)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "venv", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.15" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}