diff --git a/README.md b/README.md index 17327990a1d..d2d115e32d2 100644 --- a/README.md +++ b/README.md @@ -1,72 +1,250 @@
- Logo -

ExecuTorch: A powerful on-device AI Framework

+ ExecuTorch logo mark +

ExecuTorch

+

On-device AI inference powered by PyTorch

-
- Contributors - Stargazers - Join our Discord community - Check out the documentation -
+ PyPI - Version + GitHub - Contributors + GitHub - Stars + Discord - Chat with Us + Documentation
-**ExecuTorch** is an end-to-end solution for on-device inference and training. It powers much of Meta's on-device AI experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta Smart Glasses, WhatsApp, and more. +**ExecuTorch** is PyTorch's unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta's on-device AI across **Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses**, and [more](https://docs.pytorch.org/executorch/main/success-stories.html). + +Deploy **LLMs, vision, speech, and multimodal models** with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in. + +
+ 📘 Table of Contents + +- [Why ExecuTorch?](#why-executorch) +- [How It Works](#how-it-works) +- [Quick Start](#quick-start) + - [Installation](#installation) + - [Export and Deploy in 3 Steps](#export-and-deploy-in-3-steps) + - [Run on Device](#run-on-device) + - [LLM Example: Llama](#llm-example-llama) +- [Platform & Hardware Support](#platform--hardware-support) +- [Production Deployments](#production-deployments) +- [Examples & Models](#examples--models) +- [Key Features](#key-features) +- [Documentation](#documentation) +- [Community & Contributing](#community--contributing) +- [License](#license) + +
+ +## Why ExecuTorch? + +- **🔒 Native PyTorch Export** — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics. +- **⚡ Production-Proven** — Powers billions of users at [Meta with real-time on-device inference](https://engineering.fb.com/2025/07/28/android/executorch-on-device-ml-meta-family-of-apps/). +- **💾 Tiny Runtime** — 50KB base footprint. Runs on microcontrollers to high-end smartphones. +- **🚀 [12+ Hardware Backends](https://docs.pytorch.org/executorch/main/backends-overview.html)** — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more. +- **🎯 One Export, Multiple Backends** — Switch hardware targets with a single line change. Deploy the same model everywhere. + +## How It Works + +ExecuTorch uses **ahead-of-time (AOT) compilation** to prepare PyTorch models for edge deployment: + +1. **🧩 Export** — Capture your PyTorch model graph with `torch.export()` +2. **⚙️ Compile** — Quantize, optimize, and partition to hardware backends → `.pte` +3. **🚀 Execute** — Load `.pte` on-device via lightweight C++ runtime + +Models use a standardized [Core ATen operator set](https://docs.pytorch.org/executorch/main/concepts.html#core-aten-operators). [Partitioners](https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html) delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback. + +Learn more: [How ExecuTorch Works](https://docs.pytorch.org/executorch/main/intro-how-it-works.html) • [Architecture Guide](https://docs.pytorch.org/executorch/main/getting-started-architecture.html) + +## Quick Start + +### Installation + +```bash +pip install executorch +``` + +For platform-specific setup (Android, iOS, embedded systems), see the [Quick Start](https://docs.pytorch.org/executorch/main/quick-start-section.html) documentation for additional info. + +### Export and Deploy in 3 Steps + +```python +import torch +from executorch.exir import to_edge_transform_and_lower +from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner + +# 1. Export your PyTorch model +model = MyModel().eval() +example_inputs = (torch.randn(1, 3, 224, 224),) +exported_program = torch.export.export(model, example_inputs) + +# 2. Optimize for target hardware (switch backends with one line) +program = to_edge_transform_and_lower( + exported_program, + partitioner=[XnnpackPartitioner()] # CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm +).to_executorch() + +# 3. Save for deployment +with open("model.pte", "wb") as f: + f.write(program.buffer) + +# Test locally via ExecuTorch runtime's pybind API (optional) +from executorch.runtime import Runtime +runtime = Runtime.get() +method = runtime.load_program("model.pte").load_method("forward") +outputs = method.execute([torch.randn(1, 3, 224, 224)]) +``` + +### Run on Device + +**[C++](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html)** +```cpp +#include +#include + +Module module("model.pte"); +auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f}); +auto outputs = module.forward({tensor}); +``` + +**[Swift (iOS)](https://docs.pytorch.org/executorch/main/ios-section.html)** +```swift +let module = Module(filePath: "model.pte") +let input = Tensor([1.0, 2.0, 3.0, 4.0]) +let outputs: [Value] = try module.forward([input]) +``` + +**[Kotlin (Android)](https://docs.pytorch.org/executorch/main/android-section.html)** +```kotlin +val module = Module.load("model.pte") +val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2)) +val outputs = module.forward(EValue.from(inputTensor)) +``` + +### LLM Example: Llama + +Export Llama models using the [`export_llm`](https://docs.pytorch.org/executorch/main/llm/export-llm.html) script or [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch): + +```bash +# Using export_llm +python -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte + +# Using Optimum-ExecuTorch +optimum-cli export executorch \ + --model meta-llama/Llama-3.2-1B \ + --task text-generation \ + --recipe xnnpack \ + --output_dir llama_model +``` -It supports a wide range of models including LLMs (Large Language Models), CV (Computer Vision), ASR (Automatic Speech Recognition), and TTS (Text to Speech). +Run on-device with the LLM runner API: -Platform Support: -- Operating Systems: - - iOS - - MacOS (ARM64) - - Android - - Linux - - Microcontrollers +**[C++](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html)** +```cpp +#include -- Hardware Acceleration: - - Apple - - Arm - - Cadence - - MediaTek - - NXP - - OpenVINO - - Qualcomm - - Vulkan - - XNNPACK +auto runner = create_llama_runner("llama.pte", "tiktoken.bin"); +executorch::extension::llm::GenerationConfig config{ + .seq_len = 128, .temperature = 0.8f}; +runner->generate("Hello, how are you?", config); +``` -Key value propositions of ExecuTorch are: +**[Swift (iOS)](https://docs.pytorch.org/executorch/main/llm/run-on-ios.html)** +```swift +let runner = TextRunner(modelPath: "llama.pte", tokenizerPath: "tiktoken.bin") +try runner.generate("Hello, how are you?", Config { + $0.sequenceLength = 128 +}) { token in + print(token, terminator: "") +} +``` -- **Portability:** Compatibility with a wide variety of computing platforms, - from high-end mobile phones to highly constrained embedded systems and - microcontrollers. -- **Productivity:** Enabling developers to use the same toolchains and Developer - Tools from PyTorch model authoring and conversion, to debugging and deployment - to a wide variety of platforms. -- **Performance:** Providing end users with a seamless and high-performance - experience due to a lightweight runtime and utilizing full hardware - capabilities such as CPUs, NPUs, and DSPs. +**Kotlin (Android)** — [API Docs](https://docs.pytorch.org/executorch/main/javadoc/org/pytorch/executorch/extension/llm/package-summary.html) • [Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android/LlamaDemo) +```kotlin +val llmModule = LlmModule("llama.pte", "tiktoken.bin", 0.8f) +llmModule.load() +llmModule.generate("Hello, how are you?", 128, object : LlmCallback { + override fun onResult(result: String) { print(result) } + override fun onStats(stats: String) { } +}) +``` -## Getting Started -To get started you can: +For multimodal models (vision, audio), use the [MultiModal runner API](extension/llm/runner) which extends the LLM runner to handle image and audio inputs alongside text. See [Llava](examples/models/llava/README.md) and [Voxtral](examples/models/voxtral/README.md) examples. -- Visit the [Step by Step Tutorial](https://pytorch.org/executorch/stable/getting-started.html) to get things running locally and deploy a model to a device -- Use this [Colab Notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) to start playing around right away -- Jump straight into LLM use cases by following specific instructions for popular open-source models such as [Llama](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [Llava](examples/models/llava/README.md), [Voxtral](examples/models/voxtral/README.md), and [LFM2](examples/models/lfm2/README.md). +See [examples/models/llama](examples/models/llama/README.md) for complete workflow including quantization, mobile deployment, and advanced options. -## Feedback and Engagement +**Next Steps:** +- 📖 [Step-by-step tutorial](https://docs.pytorch.org/executorch/main/getting-started.html) — Complete walkthrough for your first model +- ⚡ [Colab notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) — Try ExecuTorch instantly in your browser +- 🤖 [Deploy Llama models](examples/models/llama/README.md) — LLM workflow with quantization and mobile demos -We welcome any feedback, suggestions, and bug reports from the community to help -us improve our technology. Check out the [Discussion Board](https://github.com/pytorch/executorch/discussions) or chat real time with us on [Discord](https://discord.gg/Dh43CKSAdc) +## Platform & Hardware Support -## Contributing +| **Platform** | **Supported Backends** | +|------------------|----------------------------------------------------------| +| Android | XNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos | +| iOS | XNNPACK, MPS, CoreML (Neural Engine) | +| Linux / Windows | XNNPACK, OpenVINO, CUDA *(experimental)* | +| macOS | XNNPACK, MPS, Metal *(experimental)* | +| Embedded / MCU | XNNPACK, ARM Ethos-U, NXP, Cadence DSP | -We welcome contributions. To get started review the [guidelines](CONTRIBUTING.md) and chat with us on [Discord](https://discord.gg/Dh43CKSAdc) +See [Backend Documentation](https://docs.pytorch.org/executorch/main/backends-overview.html) for detailed hardware requirements and optimization guides. +## Production Deployments -## Directory Structure +ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments. [View success stories →](https://docs.pytorch.org/executorch/main/success-stories.html) -Please refer to the [Codebase structure](CONTRIBUTING.md#codebase-structure) section of the [Contributing Guidelines](CONTRIBUTING.md) for more details. +## Examples & Models + +**LLMs:** [Llama 3.2/3.1/3](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [LiquidAI LFM2](examples/models/lfm2/README.md) + +**Multimodal:** [Llava](examples/models/llava/README.md) (vision-language), [Voxtral](examples/models/voxtral/README.md) (audio-language) + +**Vision/Speech:** [MobileNetV2](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2), [DeepLabV3](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3) + +**Resources:** [`examples/`](examples/) directory • [executorch-examples](https://github.com/meta-pytorch/executorch-examples) mobile demos • [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch) for HuggingFace models + +## Key Features + +ExecuTorch provides advanced capabilities for production deployment: + +- **Quantization** — Built-in support via [torchao](https://docs.pytorch.org/ao) for 8-bit, 4-bit, and dynamic quantization +- **Memory Planning** — Optimize memory usage with ahead-of-time allocation strategies +- **Developer Tools** — ETDump profiler, ETRecord inspector, and model debugger +- **Selective Build** — Strip unused operators to minimize binary size +- **Custom Operators** — Extend with domain-specific kernels +- **Dynamic Shapes** — Support variable input sizes with bounded ranges + +See [Advanced Topics](https://docs.pytorch.org/executorch/main/advanced-topics-section.html) for quantization techniques, custom backends, and compiler passes. + +## Documentation + +- [**Documentation Home**](https://docs.pytorch.org/executorch/main/index.html) — Complete guides and tutorials +- [**API Reference**](https://docs.pytorch.org/executorch/main/api-section.html) — Python, C++, Java/Kotlin APIs +- [**Backend Integration**](https://docs.pytorch.org/executorch/main/backend-delegates-integration.html) — Build custom hardware backends +- [**Troubleshooting**](https://docs.pytorch.org/executorch/main/using-executorch-troubleshooting.html) — Common issues and solutions + +## Community & Contributing + +We welcome contributions from the community! + +- 💬 [**GitHub Discussions**](https://github.com/pytorch/executorch/discussions) — Ask questions and share ideas +- 🎮 [**Discord**](https://discord.gg/Dh43CKSAdc) — Chat with the team and community +- 🐛 [**Issues**](https://github.com/pytorch/executorch/issues) — Report bugs or request features +- 🤝 [**Contributing Guide**](CONTRIBUTING.md) — Guidelines and codebase structure ## License -ExecuTorch is BSD licensed, as found in the LICENSE file. + +ExecuTorch is BSD licensed, as found in the [LICENSE](LICENSE) file. + +

+ +--- + +
+

Part of the PyTorch ecosystem

+

+ GitHub • + Documentation +

+