diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 1edbacd598a..71e097042d7 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -199,8 +199,7 @@ We use [`lintrunner`](https://pypi.org/project/lintrunner/) to help make sure th code follows our standards. Set it up with: ``` -pip install lintrunner==0.12.7 -pip install lintrunner-adapters==0.12.4 +./install_requirements.sh # (automatically run by install_executorch.sh) lintrunner init ``` diff --git a/README.md b/README.md index 17327990a1d..66be37bedc8 100644 --- a/README.md +++ b/README.md @@ -1,72 +1,250 @@
- Logo -

ExecuTorch: A powerful on-device AI Framework

+ ExecuTorch logo mark +

ExecuTorch

+

On-device AI inference powered by PyTorch

-
- Contributors - Stargazers - Join our Discord community - Check out the documentation -
+ PyPI - Version + GitHub - Contributors + GitHub - Stars + Discord - Chat with Us + Documentation
-**ExecuTorch** is an end-to-end solution for on-device inference and training. It powers much of Meta's on-device AI experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta Smart Glasses, WhatsApp, and more. +**ExecuTorch** is PyTorch's unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta's on-device AI across **Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses**, and [more](https://docs.pytorch.org/executorch/main/success-stories.html). + +Deploy **LLMs, vision, speech, and multimodal models** with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in. + +
+ 📘 Table of Contents + +- [Why ExecuTorch?](#why-executorch) +- [How It Works](#how-it-works) +- [Quick Start](#quick-start) + - [Installation](#installation) + - [Export and Deploy in 3 Steps](#export-and-deploy-in-3-steps) + - [Run on Device](#run-on-device) + - [LLM Example: Llama](#llm-example-llama) +- [Platform & Hardware Support](#platform--hardware-support) +- [Production Deployments](#production-deployments) +- [Examples & Models](#examples--models) +- [Key Features](#key-features) +- [Documentation](#documentation) +- [Community & Contributing](#community--contributing) +- [License](#license) + +
+ +## Why ExecuTorch? + +- **🔒 Native PyTorch Export** — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics. +- **⚡ Production-Proven** — Powers billions of users at [Meta with real-time on-device inference](https://engineering.fb.com/2025/07/28/android/executorch-on-device-ml-meta-family-of-apps/). +- **💾 Tiny Runtime** — 50KB base footprint. Runs on microcontrollers to high-end smartphones. +- **🚀 [12+ Hardware Backends](https://docs.pytorch.org/executorch/main/backends-overview.html)** — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more. +- **🎯 One Export, Multiple Backends** — Switch hardware targets with a single line change. Deploy the same model everywhere. + +## How It Works + +ExecuTorch uses **ahead-of-time (AOT) compilation** to prepare PyTorch models for edge deployment: + +1. **🧩 Export** — Capture your PyTorch model graph with `torch.export()` +2. **⚙️ Compile** — Quantize, optimize, and partition to hardware backends → `.pte` +3. **🚀 Execute** — Load `.pte` on-device via lightweight C++ runtime + +Models use a standardized [Core ATen operator set](https://docs.pytorch.org/executorch/main/compiler-ir-advanced.html#intermediate-representation). [Partitioners](https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html) delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback. + +Learn more: [How ExecuTorch Works](https://docs.pytorch.org/executorch/main/intro-how-it-works.html) • [Architecture Guide](https://docs.pytorch.org/executorch/main/getting-started-architecture.html) + +## Quick Start + +### Installation + +```bash +pip install executorch +``` + +For platform-specific setup (Android, iOS, embedded systems), see the [Quick Start](https://docs.pytorch.org/executorch/main/quick-start-section.html) documentation for additional info. + +### Export and Deploy in 3 Steps + +```python +import torch +from executorch.exir import to_edge_transform_and_lower +from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner + +# 1. Export your PyTorch model +model = MyModel().eval() +example_inputs = (torch.randn(1, 3, 224, 224),) +exported_program = torch.export.export(model, example_inputs) + +# 2. Optimize for target hardware (switch backends with one line) +program = to_edge_transform_and_lower( + exported_program, + partitioner=[XnnpackPartitioner()] # CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm +).to_executorch() + +# 3. Save for deployment +with open("model.pte", "wb") as f: + f.write(program.buffer) + +# Test locally via ExecuTorch runtime's pybind API (optional) +from executorch.runtime import Runtime +runtime = Runtime.get() +method = runtime.load_program("model.pte").load_method("forward") +outputs = method.execute([torch.randn(1, 3, 224, 224)]) +``` + +### Run on Device + +**[C++](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html)** +```cpp +#include +#include + +Module module("model.pte"); +auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f}); +auto outputs = module.forward({tensor}); +``` + +**[Swift (iOS)](https://docs.pytorch.org/executorch/main/ios-section.html)** +```swift +let module = Module(filePath: "model.pte") +let input = Tensor([1.0, 2.0, 3.0, 4.0]) +let outputs: [Value] = try module.forward([input]) +``` + +**[Kotlin (Android)](https://docs.pytorch.org/executorch/main/android-section.html)** +```kotlin +val module = Module.load("model.pte") +val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2)) +val outputs = module.forward(EValue.from(inputTensor)) +``` + +### LLM Example: Llama + +Export Llama models using the [`export_llm`](https://docs.pytorch.org/executorch/main/llm/export-llm.html) script or [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch): + +```bash +# Using export_llm +python -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte + +# Using Optimum-ExecuTorch +optimum-cli export executorch \ + --model meta-llama/Llama-3.2-1B \ + --task text-generation \ + --recipe xnnpack \ + --output_dir llama_model +``` -It supports a wide range of models including LLMs (Large Language Models), CV (Computer Vision), ASR (Automatic Speech Recognition), and TTS (Text to Speech). +Run on-device with the LLM runner API: -Platform Support: -- Operating Systems: - - iOS - - MacOS (ARM64) - - Android - - Linux - - Microcontrollers +**[C++](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html)** +```cpp +#include -- Hardware Acceleration: - - Apple - - Arm - - Cadence - - MediaTek - - NXP - - OpenVINO - - Qualcomm - - Vulkan - - XNNPACK +auto runner = create_llama_runner("llama.pte", "tiktoken.bin"); +executorch::extension::llm::GenerationConfig config{ + .seq_len = 128, .temperature = 0.8f}; +runner->generate("Hello, how are you?", config); +``` -Key value propositions of ExecuTorch are: +**[Swift (iOS)](https://docs.pytorch.org/executorch/main/llm/run-on-ios.html)** +```swift +let runner = TextRunner(modelPath: "llama.pte", tokenizerPath: "tiktoken.bin") +try runner.generate("Hello, how are you?", Config { + $0.sequenceLength = 128 +}) { token in + print(token, terminator: "") +} +``` -- **Portability:** Compatibility with a wide variety of computing platforms, - from high-end mobile phones to highly constrained embedded systems and - microcontrollers. -- **Productivity:** Enabling developers to use the same toolchains and Developer - Tools from PyTorch model authoring and conversion, to debugging and deployment - to a wide variety of platforms. -- **Performance:** Providing end users with a seamless and high-performance - experience due to a lightweight runtime and utilizing full hardware - capabilities such as CPUs, NPUs, and DSPs. +**Kotlin (Android)** — [API Docs](https://docs.pytorch.org/executorch/main/javadoc/org/pytorch/executorch/extension/llm/package-summary.html) • [Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android/LlamaDemo) +```kotlin +val llmModule = LlmModule("llama.pte", "tiktoken.bin", 0.8f) +llmModule.load() +llmModule.generate("Hello, how are you?", 128, object : LlmCallback { + override fun onResult(result: String) { print(result) } + override fun onStats(stats: String) { } +}) +``` -## Getting Started -To get started you can: +For multimodal models (vision, audio), use the [MultiModal runner API](extension/llm/runner) which extends the LLM runner to handle image and audio inputs alongside text. See [Llava](examples/models/llava/README.md) and [Voxtral](examples/models/voxtral/README.md) examples. -- Visit the [Step by Step Tutorial](https://pytorch.org/executorch/stable/getting-started.html) to get things running locally and deploy a model to a device -- Use this [Colab Notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) to start playing around right away -- Jump straight into LLM use cases by following specific instructions for popular open-source models such as [Llama](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [Llava](examples/models/llava/README.md), [Voxtral](examples/models/voxtral/README.md), and [LFM2](examples/models/lfm2/README.md). +See [examples/models/llama](examples/models/llama/README.md) for complete workflow including quantization, mobile deployment, and advanced options. -## Feedback and Engagement +**Next Steps:** +- 📖 [Step-by-step tutorial](https://docs.pytorch.org/executorch/main/getting-started.html) — Complete walkthrough for your first model +- ⚡ [Colab notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) — Try ExecuTorch instantly in your browser +- 🤖 [Deploy Llama models](examples/models/llama/README.md) — LLM workflow with quantization and mobile demos -We welcome any feedback, suggestions, and bug reports from the community to help -us improve our technology. Check out the [Discussion Board](https://github.com/pytorch/executorch/discussions) or chat real time with us on [Discord](https://discord.gg/Dh43CKSAdc) +## Platform & Hardware Support -## Contributing +| **Platform** | **Supported Backends** | +|------------------|----------------------------------------------------------| +| Android | XNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos | +| iOS | XNNPACK, MPS, CoreML (Neural Engine) | +| Linux / Windows | XNNPACK, OpenVINO, CUDA *(experimental)* | +| macOS | XNNPACK, MPS, Metal *(experimental)* | +| Embedded / MCU | XNNPACK, ARM Ethos-U, NXP, Cadence DSP | -We welcome contributions. To get started review the [guidelines](CONTRIBUTING.md) and chat with us on [Discord](https://discord.gg/Dh43CKSAdc) +See [Backend Documentation](https://docs.pytorch.org/executorch/main/backends-overview.html) for detailed hardware requirements and optimization guides. +## Production Deployments -## Directory Structure +ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments. [View success stories →](https://docs.pytorch.org/executorch/main/success-stories.html) -Please refer to the [Codebase structure](CONTRIBUTING.md#codebase-structure) section of the [Contributing Guidelines](CONTRIBUTING.md) for more details. +## Examples & Models + +**LLMs:** [Llama 3.2/3.1/3](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [LiquidAI LFM2](examples/models/lfm2/README.md) + +**Multimodal:** [Llava](examples/models/llava/README.md) (vision-language), [Voxtral](examples/models/voxtral/README.md) (audio-language) + +**Vision/Speech:** [MobileNetV2](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2), [DeepLabV3](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3), [Whisper](https://github.com/meta-pytorch/executorch-examples/tree/main/whisper/android/WhisperApp) + +**Resources:** [`examples/`](examples/) directory • [executorch-examples](https://github.com/meta-pytorch/executorch-examples) out-of-tree demos • [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch) for HuggingFace models + +## Key Features + +ExecuTorch provides advanced capabilities for production deployment: + +- **Quantization** — Built-in support via [torchao](https://docs.pytorch.org/ao) for 8-bit, 4-bit, and dynamic quantization +- **Memory Planning** — Optimize memory usage with ahead-of-time allocation strategies +- **Developer Tools** — ETDump profiler, ETRecord inspector, and model debugger +- **Selective Build** — Strip unused operators to minimize binary size +- **Custom Operators** — Extend with domain-specific kernels +- **Dynamic Shapes** — Support variable input sizes with bounded ranges + +See [Advanced Topics](https://docs.pytorch.org/executorch/main/advanced-topics-section.html) for quantization techniques, custom backends, and compiler passes. + +## Documentation + +- [**Documentation Home**](https://docs.pytorch.org/executorch/main/index.html) — Complete guides and tutorials +- [**API Reference**](https://docs.pytorch.org/executorch/main/api-section.html) — Python, C++, Java/Kotlin APIs +- [**Backend Integration**](https://docs.pytorch.org/executorch/main/backend-delegates-integration.html) — Build custom hardware backends +- [**Troubleshooting**](https://docs.pytorch.org/executorch/main/support-section.html) — Common issues and solutions + +## Community & Contributing + +We welcome contributions from the community! + +- 💬 [**GitHub Discussions**](https://github.com/pytorch/executorch/discussions) — Ask questions and share ideas +- 🎮 [**Discord**](https://discord.gg/Dh43CKSAdc) — Chat with the team and community +- 🐛 [**Issues**](https://github.com/pytorch/executorch/issues) — Report bugs or request features +- 🤝 [**Contributing Guide**](CONTRIBUTING.md) — Guidelines and codebase structure ## License -ExecuTorch is BSD licensed, as found in the LICENSE file. + +ExecuTorch is BSD licensed, as found in the [LICENSE](LICENSE) file. + +

+ +--- + +
+

Part of the PyTorch ecosystem

+

+ GitHub • + Documentation +

+
diff --git a/backends/test/suite/README.md b/backends/test/suite/README.md index 564f44362ad..901cd461dbe 100644 --- a/backends/test/suite/README.md +++ b/backends/test/suite/README.md @@ -5,37 +5,71 @@ This directory contains tests that validate correctness and coverage of backends These tests are intended to ensure that backends are robust and provide a smooth, "out-of-box" experience for users across the full span of input patterns. They are not intended to be a replacement for backend-specific tests, as they do not attempt to validate performance or that backends delegate operators that they expect to. ## Running Tests and Interpreting Output -Tests can be run from the command line, either using the runner.py entry point or the standard Python unittest runner. When running through runner.py, the test runner will report test statistics, including the number of tests with each result type. +Tests can be run from the command line using pytest. When generating a JSON test report, the runner will report detailed test statistics, including output accuracy, delegated nodes, lowering timing, and more. -Backends can be specified with the `ET_TEST_ENABLED_BACKENDS` environment variable. By default, all available backends are enabled. Note that backends such as Core ML or Vulkan may require specific hardware or software to be available. See the documentation for each backend for information on requirements. +Each backend and test flow (recipe) registers a pytest [marker](https://docs.pytest.org/en/stable/example/markers.html) that can be passed to pytest with the `-m marker` argument to filter execution. -Example: +To run all XNNPACK backend operator tests: ``` -ET_TEST_ENABLED_BACKENDS=xnnpack python -m executorch.backends.test.suite.runner +pytest -c /dev/nul backends/test/suite/operators/ -m backend_xnnpack -n auto ``` +To run all model tests for the CoreML static int8 lowering flow: +``` +pytest -c /dev/nul backends/test/suite/models/ -m flow_coreml_static_int8 -n auto ``` -2465 Passed / 2494 -16 Failed -13 Skipped -[Success] -736 Delegated -1729 Undelegated +To run a specific test: +``` +pytest -c /dev/nul backends/test/suite/ -k "test_prelu_f32_custom_init[xnnpack]" +``` -[Failure] -5 Lowering Fail -3 PTE Run Fail -8 Output Mismatch Fail +To generate a JSON report: +``` +pytest -c /dev/nul backends/test/suite/operators/ -n auto --json-report --json-report-file="test_report.json" ``` -Outcomes can be interpreted as follows: - * Success (delegated): The test passed and at least one op was delegated by the backend. - * Success (undelegated): The test passed with no ops delegated by the backend. This is a pass, as the partitioner works as intended. - * Skipped: test fails in eager or export (indicative of a test or dynamo issue). - * Lowering fail: The test fails in to_edge_transform_and_lower. - * PTE run failure: The test errors out when loading or running the method. - * Output mismatch failure: Output delta (vs eager) exceeds the configured tolerance. +See [pytest-json-report](https://pypi.org/project/pytest-json-report/) for information on the report format. The test logic in this repository attaches additional metadata to each test entry under the `metadata`/`subtests` keys. One entry is created for each call to `test_runner.lower_and_run_model`. + +Here is a excerpt from a test run, showing a successful run of the `test_add_f32_bcast_first[xnnpack]` test. +```json +"tests": [ + { + "nodeid": "operators/test_add.py::test_add_f32_bcast_first[xnnpack]", + "lineno": 38, + "outcome": "passed", + "keywords": [ + "test_add_f32_bcast_first[xnnpack]", + "flow_xnnpack", + "backend_xnnpack", + ... + ], + "metadata": { + "subtests": [ + { + "Test ID": "test_add_f32_bcast_first[xnnpack]", + "Test Case": "test_add_f32_bcast_first", + "Subtest": 0, + "Flow": "xnnpack", + "Result": "Pass", + "Result Detail": "", + "Error": "", + "Delegated": "True", + "Quantize Time (s)": null, + "Lower Time (s)": "2.881", + "Output 0 Error Max": "0.000", + "Output 0 Error MAE": "0.000", + "Output 0 SNR": "inf", + "Delegated Nodes": 1, + "Undelegated Nodes": 0, + "Delegated Ops": { + "aten::add.Tensor": 1 + }, + "PTE Size (Kb)": "1.600" + } + ] + } +``` ## Backend Registration @@ -43,11 +77,11 @@ To plug into the test framework, each backend should provide an implementation o At a minimum, the backend will likely need to provide a custom implementation of the Partition and ToEdgeTransformAndLower stages using the appropriate backend partitioner. See backends/xnnpack/test/tester/tester.py for an example implementation. -Once a tester is available, the backend flow(s) can be added in __init__.py in this directory by adding an entry to `ALL_TESTER_FLOWS`. Each flow entry consists of a name (used in the test case naming) and a function to instantiate a tester for a given model and input tuple. +Once a tester is available, the backend flow(s) can be added under flows/ and registered in flow.py. It is intended that this will be unified with the lowering recipes under executorch/export in the near future. ## Test Cases -Operator test cases are defined under the operators/ directory. Tests are written in a backend-independent manner, and each test is programmatically expanded to generate a variant for each registered backend flow. The `@operator_test` decorator is applied to each test class to trigger this behavior. Tests can also be tagged with an appropriate type specifier, such as `@dtype_test`, to generate variants for each dtype. The decorators and "magic" live in __init__.py in this directory. +Operator test cases are defined under the operators/ directory. Model tests are under models/. Tests are written in a backend-independent manner, and each test is programmatically expanded to generate a variant for each registered backend flow by use of the `test_runner` fixture parameter. Tests can additionally be parameterized using standard pytest decorators. Parameterizing over dtype is a common use case. ## Evolution of this Test Suite diff --git a/docs/source/_static/img/ExecuTorch-Logo-cropped.svg b/docs/source/_static/img/ExecuTorch-Logo-cropped.svg deleted file mode 100644 index 9e0ef52fbd8..00000000000 --- a/docs/source/_static/img/ExecuTorch-Logo-cropped.svg +++ /dev/null @@ -1,57 +0,0 @@ - - - - - - - - - - - diff --git a/docs/source/_static/img/executorch-chip-logo-circle-16.png b/docs/source/_static/img/executorch-chip-logo-circle-16.png new file mode 100644 index 00000000000..a3966ae27db Binary files /dev/null and b/docs/source/_static/img/executorch-chip-logo-circle-16.png differ diff --git a/docs/source/_static/img/executorch-chip-logo-circle-32.png b/docs/source/_static/img/executorch-chip-logo-circle-32.png new file mode 100644 index 00000000000..83f1018a76c Binary files /dev/null and b/docs/source/_static/img/executorch-chip-logo-circle-32.png differ diff --git a/docs/source/_static/img/executorch-chip-logo.svg b/docs/source/_static/img/executorch-chip-logo.svg new file mode 100644 index 00000000000..11e5ed60956 --- /dev/null +++ b/docs/source/_static/img/executorch-chip-logo.svg @@ -0,0 +1,205 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/source/conf.py b/docs/source/conf.py index f1869d38a46..ac9d7458222 100644 --- a/docs/source/conf.py +++ b/docs/source/conf.py @@ -74,7 +74,7 @@ "xml", # {repo_root}/docs/cpp/build/xml ) -html_favicon = "_static/img/ExecuTorch-Logo-cropped.svg" +html_favicon = "_static/img/executorch-chip-logo.svg" # Get ET_VERSION_DOCS during the build. et_version_docs = os.environ.get("ET_VERSION_DOCS", None) diff --git a/examples/models/voxtral/README.md b/examples/models/voxtral/README.md index 5f4eeb2ff95..8cac4264bba 100644 --- a/examples/models/voxtral/README.md +++ b/examples/models/voxtral/README.md @@ -18,11 +18,6 @@ cd optimum-executorch python install_dev.py ``` -We are currently working on a Transformers pin bump for Optimum. In the meantime, manually override the Transformers dep to the earliest compatible version. -``` -pip install git+https://github.com/huggingface/transformers@6121e9e46c4fc4e5c91d9f927aef5490691850cf#egg=transformers -``` - ## Using the export CLI We export Voxtral using the Optimum CLI, which will export `model.pte` to the `voxtral` output directory: ``` @@ -32,7 +27,9 @@ optimum-cli export executorch \ --recipe "xnnpack" \ --use_custom_sdpa \ --use_custom_kv_cache \ + --max_seq_len 2048 \ --qlinear 8da4w \ + --qlinear_encoder 8da4w \ --qembedding 4w \ --output_dir="voxtral" ``` @@ -49,7 +46,7 @@ The Voxtral runner will do the following things: - Feed the formatted inputs to the multimodal modal runner. -# [Option A] Exporting the audio preprocessor +## Exporting the audio preprocessor The exported model takes in a mel spectrogram input tensor as its audio inputs. We provide a simple way to transform raw audio data into a mel spectrogram by exporting a version of Voxtral's audio preprocessor used directly by Transformers.