Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
278 changes: 228 additions & 50 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,72 +1,250 @@
<div align="center">
<img src="docs/source/_static/img/et-logo.png" alt="Logo" width="200">
<h1 align="center">ExecuTorch: A powerful on-device AI Framework</h1>
<img src="docs/source/_static/img/et-logo.png" alt="ExecuTorch logo mark" width="200">
<h1>ExecuTorch</h1>
<p><strong>On-device AI inference powered by PyTorch</strong></p>
</div>


<div align="center">
<a href="https://github.com/pytorch/executorch/graphs/contributors"><img src="https://img.shields.io/github/contributors/pytorch/executorch?style=for-the-badge&color=blue" alt="Contributors"></a>
<a href="https://github.com/pytorch/executorch/stargazers"><img src="https://img.shields.io/github/stars/pytorch/executorch?style=for-the-badge&color=blue" alt="Stargazers"></a>
<a href="https://discord.gg/Dh43CKSAdc"><img src="https://img.shields.io/badge/Discord-Join%20Us-purple?logo=discord&logoColor=white&style=for-the-badge" alt="Join our Discord community"></a>
<a href="https://pytorch.org/executorch/main/index"><img src="https://img.shields.io/badge/Documentation-000?logo=googledocs&logoColor=FFE165&style=for-the-badge" alt="Check out the documentation"></a>
<hr>
<a href="https://pypi.org/project/executorch/"><img src="https://img.shields.io/pypi/v/executorch?style=for-the-badge&color=blue" alt="PyPI - Version"></a>
<a href="https://github.com/pytorch/executorch/graphs/contributors"><img src="https://img.shields.io/github/contributors/pytorch/executorch?style=for-the-badge&color=blue" alt="GitHub - Contributors"></a>
<a href="https://github.com/pytorch/executorch/stargazers"><img src="https://img.shields.io/github/stars/pytorch/executorch?style=for-the-badge&color=blue" alt="GitHub - Stars"></a>
<a href="https://discord.gg/Dh43CKSAdc"><img src="https://img.shields.io/badge/Discord-Join%20Us-blue?logo=discord&logoColor=white&style=for-the-badge" alt="Discord - Chat with Us"></a>
<a href="https://docs.pytorch.org/executorch/main/index.html"><img src="https://img.shields.io/badge/Documentation-blue?logo=googledocs&logoColor=white&style=for-the-badge" alt="Documentation"></a>
</div>

**ExecuTorch** is an end-to-end solution for on-device inference and training. It powers much of Meta's on-device AI experiences across Facebook, Instagram, Meta Quest, Ray-Ban Meta Smart Glasses, WhatsApp, and more.
**ExecuTorch** is PyTorch's unified solution for deploying AI models on-device—from smartphones to microcontrollers—built for privacy, performance, and portability. It powers Meta's on-device AI across **Instagram, WhatsApp, Quest 3, Ray-Ban Meta Smart Glasses**, and [more](https://docs.pytorch.org/executorch/main/success-stories.html).

Deploy **LLMs, vision, speech, and multimodal models** with the same PyTorch APIs you already know—accelerating research to production with seamless model export, optimization, and deployment. No manual C++ rewrites. No format conversions. No vendor lock-in.

<details>
<summary><strong>📘 Table of Contents</strong></summary>

- [Why ExecuTorch?](#why-executorch)
- [How It Works](#how-it-works)
- [Quick Start](#quick-start)
- [Installation](#installation)
- [Export and Deploy in 3 Steps](#export-and-deploy-in-3-steps)
- [Run on Device](#run-on-device)
- [LLM Example: Llama](#llm-example-llama)
- [Platform & Hardware Support](#platform--hardware-support)
- [Production Deployments](#production-deployments)
- [Examples & Models](#examples--models)
- [Key Features](#key-features)
- [Documentation](#documentation)
- [Community & Contributing](#community--contributing)
- [License](#license)

</details>

## Why ExecuTorch?

- **🔒 Native PyTorch Export** — Direct export from PyTorch. No .onnx, .tflite, or intermediate format conversions. Preserve model semantics.
- **⚡ Production-Proven** — Powers billions of users at [Meta with real-time on-device inference](https://engineering.fb.com/2025/07/28/android/executorch-on-device-ml-meta-family-of-apps/).
- **💾 Tiny Runtime** — 50KB base footprint. Runs on microcontrollers to high-end smartphones.
- **🚀 [12+ Hardware Backends](https://docs.pytorch.org/executorch/main/backends-overview.html)** — Open-source acceleration for Apple, Qualcomm, ARM, MediaTek, Vulkan, and more.
- **🎯 One Export, Multiple Backends** — Switch hardware targets with a single line change. Deploy the same model everywhere.

## How It Works

ExecuTorch uses **ahead-of-time (AOT) compilation** to prepare PyTorch models for edge deployment:

1. **🧩 Export** — Capture your PyTorch model graph with `torch.export()`
2. **⚙️ Compile** — Quantize, optimize, and partition to hardware backends → `.pte`
3. **🚀 Execute** — Load `.pte` on-device via lightweight C++ runtime

Models use a standardized [Core ATen operator set](https://docs.pytorch.org/executorch/main/concepts.html#core-aten-operators). [Partitioners](https://docs.pytorch.org/executorch/main/compiler-delegate-and-partitioner.html) delegate subgraphs to specialized hardware (NPU/GPU) with CPU fallback.

Learn more: [How ExecuTorch Works](https://docs.pytorch.org/executorch/main/intro-how-it-works.html) • [Architecture Guide](https://docs.pytorch.org/executorch/main/getting-started-architecture.html)

## Quick Start

### Installation

```bash
pip install executorch
```

For platform-specific setup (Android, iOS, embedded systems), see the [Quick Start](https://docs.pytorch.org/executorch/main/quick-start-section.html) documentation for additional info.

### Export and Deploy in 3 Steps

```python
import torch
from executorch.exir import to_edge_transform_and_lower
from executorch.backends.xnnpack.partition.xnnpack_partitioner import XnnpackPartitioner

# 1. Export your PyTorch model
model = MyModel().eval()
example_inputs = (torch.randn(1, 3, 224, 224),)
exported_program = torch.export.export(model, example_inputs)

# 2. Optimize for target hardware (switch backends with one line)
program = to_edge_transform_and_lower(
exported_program,
partitioner=[XnnpackPartitioner()] # CPU | CoreMLPartitioner() for iOS | QnnPartitioner() for Qualcomm
).to_executorch()

# 3. Save for deployment
with open("model.pte", "wb") as f:
f.write(program.buffer)

# Test locally via ExecuTorch runtime's pybind API (optional)
from executorch.runtime import Runtime
runtime = Runtime.get()
method = runtime.load_program("model.pte").load_method("forward")
outputs = method.execute([torch.randn(1, 3, 224, 224)])
```

### Run on Device

**[C++](https://docs.pytorch.org/executorch/main/using-executorch-cpp.html)**
```cpp
#include <executorch/extension/module/module.h>
#include <executorch/extension/tensor/tensor.h>

Module module("model.pte");
auto tensor = make_tensor_ptr({2, 2}, {1.0f, 2.0f, 3.0f, 4.0f});
auto outputs = module.forward({tensor});
```

**[Swift (iOS)](https://docs.pytorch.org/executorch/main/ios-section.html)**
```swift
let module = Module(filePath: "model.pte")
let input = Tensor<Float>([1.0, 2.0, 3.0, 4.0])
let outputs: [Value] = try module.forward([input])
```

**[Kotlin (Android)](https://docs.pytorch.org/executorch/main/android-section.html)**
```kotlin
val module = Module.load("model.pte")
val inputTensor = Tensor.fromBlob(floatArrayOf(1.0f, 2.0f, 3.0f, 4.0f), longArrayOf(2, 2))
val outputs = module.forward(EValue.from(inputTensor))
```

### LLM Example: Llama

Export Llama models using the [`export_llm`](https://docs.pytorch.org/executorch/main/llm/export-llm.html) script or [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch):

```bash
# Using export_llm
python -m executorch.extension.llm.export.export_llm --model llama3_2 --output llama.pte

# Using Optimum-ExecuTorch
optimum-cli export executorch \
--model meta-llama/Llama-3.2-1B \
--task text-generation \
--recipe xnnpack \
--output_dir llama_model
```

It supports a wide range of models including LLMs (Large Language Models), CV (Computer Vision), ASR (Automatic Speech Recognition), and TTS (Text to Speech).
Run on-device with the LLM runner API:

Platform Support:
- Operating Systems:
- iOS
- MacOS (ARM64)
- Android
- Linux
- Microcontrollers
**[C++](https://docs.pytorch.org/executorch/main/llm/run-with-c-plus-plus.html)**
```cpp
#include <executorch/extension/llm/runner/text_llm_runner.h>

- Hardware Acceleration:
- Apple
- Arm
- Cadence
- MediaTek
- NXP
- OpenVINO
- Qualcomm
- Vulkan
- XNNPACK
auto runner = create_llama_runner("llama.pte", "tiktoken.bin");
executorch::extension::llm::GenerationConfig config{
.seq_len = 128, .temperature = 0.8f};
runner->generate("Hello, how are you?", config);
```

Key value propositions of ExecuTorch are:
**[Swift (iOS)](https://docs.pytorch.org/executorch/main/llm/run-on-ios.html)**
```swift
let runner = TextRunner(modelPath: "llama.pte", tokenizerPath: "tiktoken.bin")
try runner.generate("Hello, how are you?", Config {
$0.sequenceLength = 128
}) { token in
print(token, terminator: "")
}
```

- **Portability:** Compatibility with a wide variety of computing platforms,
from high-end mobile phones to highly constrained embedded systems and
microcontrollers.
- **Productivity:** Enabling developers to use the same toolchains and Developer
Tools from PyTorch model authoring and conversion, to debugging and deployment
to a wide variety of platforms.
- **Performance:** Providing end users with a seamless and high-performance
experience due to a lightweight runtime and utilizing full hardware
capabilities such as CPUs, NPUs, and DSPs.
**Kotlin (Android)** — [API Docs](https://docs.pytorch.org/executorch/main/javadoc/org/pytorch/executorch/extension/llm/package-summary.html) • [Demo App](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android/LlamaDemo)
```kotlin
val llmModule = LlmModule("llama.pte", "tiktoken.bin", 0.8f)
llmModule.load()
llmModule.generate("Hello, how are you?", 128, object : LlmCallback {
override fun onResult(result: String) { print(result) }
override fun onStats(stats: String) { }
})
```

## Getting Started
To get started you can:
For multimodal models (vision, audio), use the [MultiModal runner API](extension/llm/runner) which extends the LLM runner to handle image and audio inputs alongside text. See [Llava](examples/models/llava/README.md) and [Voxtral](examples/models/voxtral/README.md) examples.

- Visit the [Step by Step Tutorial](https://pytorch.org/executorch/stable/getting-started.html) to get things running locally and deploy a model to a device
- Use this [Colab Notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) to start playing around right away
- Jump straight into LLM use cases by following specific instructions for popular open-source models such as [Llama](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [Llava](examples/models/llava/README.md), [Voxtral](examples/models/voxtral/README.md), and [LFM2](examples/models/lfm2/README.md).
See [examples/models/llama](examples/models/llama/README.md) for complete workflow including quantization, mobile deployment, and advanced options.

## Feedback and Engagement
**Next Steps:**
- 📖 [Step-by-step tutorial](https://docs.pytorch.org/executorch/main/getting-started.html) — Complete walkthrough for your first model
- ⚡ [Colab notebook](https://colab.research.google.com/drive/1qpxrXC3YdJQzly3mRg-4ayYiOjC6rue3?usp=sharing) — Try ExecuTorch instantly in your browser
- 🤖 [Deploy Llama models](examples/models/llama/README.md) — LLM workflow with quantization and mobile demos

We welcome any feedback, suggestions, and bug reports from the community to help
us improve our technology. Check out the [Discussion Board](https://github.com/pytorch/executorch/discussions) or chat real time with us on [Discord](https://discord.gg/Dh43CKSAdc)
## Platform & Hardware Support

## Contributing
| **Platform** | **Supported Backends** |
|------------------|----------------------------------------------------------|
| Android | XNNPACK, Vulkan, Qualcomm, MediaTek, Samsung Exynos |
| iOS | XNNPACK, MPS, CoreML (Neural Engine) |
| Linux / Windows | XNNPACK, OpenVINO, CUDA *(experimental)* |
| macOS | XNNPACK, MPS, Metal *(experimental)* |
| Embedded / MCU | XNNPACK, ARM Ethos-U, NXP, Cadence DSP |

We welcome contributions. To get started review the [guidelines](CONTRIBUTING.md) and chat with us on [Discord](https://discord.gg/Dh43CKSAdc)
See [Backend Documentation](https://docs.pytorch.org/executorch/main/backends-overview.html) for detailed hardware requirements and optimization guides.

## Production Deployments

## Directory Structure
ExecuTorch powers on-device AI at scale across Meta's family of apps, VR/AR devices, and partner deployments. [View success stories →](https://docs.pytorch.org/executorch/main/success-stories.html)

Please refer to the [Codebase structure](CONTRIBUTING.md#codebase-structure) section of the [Contributing Guidelines](CONTRIBUTING.md) for more details.
## Examples & Models

**LLMs:** [Llama 3.2/3.1/3](examples/models/llama/README.md), [Qwen 3](examples/models/qwen3/README.md), [Phi-4-mini](examples/models/phi_4_mini/README.md), [LiquidAI LFM2](examples/models/lfm2/README.md)

**Multimodal:** [Llava](examples/models/llava/README.md) (vision-language), [Voxtral](examples/models/voxtral/README.md) (audio-language)

**Vision/Speech:** [MobileNetV2](https://github.com/meta-pytorch/executorch-examples/tree/main/mv2), [DeepLabV3](https://github.com/meta-pytorch/executorch-examples/tree/main/dl3)

**Resources:** [`examples/`](examples/) directory • [executorch-examples](https://github.com/meta-pytorch/executorch-examples) mobile demos • [Optimum-ExecuTorch](https://github.com/huggingface/optimum-executorch) for HuggingFace models

## Key Features

ExecuTorch provides advanced capabilities for production deployment:

- **Quantization** — Built-in support via [torchao](https://docs.pytorch.org/ao) for 8-bit, 4-bit, and dynamic quantization
- **Memory Planning** — Optimize memory usage with ahead-of-time allocation strategies
- **Developer Tools** — ETDump profiler, ETRecord inspector, and model debugger
- **Selective Build** — Strip unused operators to minimize binary size
- **Custom Operators** — Extend with domain-specific kernels
- **Dynamic Shapes** — Support variable input sizes with bounded ranges

See [Advanced Topics](https://docs.pytorch.org/executorch/main/advanced-topics-section.html) for quantization techniques, custom backends, and compiler passes.

## Documentation

- [**Documentation Home**](https://docs.pytorch.org/executorch/main/index.html) — Complete guides and tutorials
- [**API Reference**](https://docs.pytorch.org/executorch/main/api-section.html) — Python, C++, Java/Kotlin APIs
- [**Backend Integration**](https://docs.pytorch.org/executorch/main/backend-delegates-integration.html) — Build custom hardware backends
- [**Troubleshooting**](https://docs.pytorch.org/executorch/main/using-executorch-troubleshooting.html) — Common issues and solutions

## Community & Contributing

We welcome contributions from the community!

- 💬 [**GitHub Discussions**](https://github.com/pytorch/executorch/discussions) — Ask questions and share ideas
- 🎮 [**Discord**](https://discord.gg/Dh43CKSAdc) — Chat with the team and community
- 🐛 [**Issues**](https://github.com/pytorch/executorch/issues) — Report bugs or request features
- 🤝 [**Contributing Guide**](CONTRIBUTING.md) — Guidelines and codebase structure

## License
ExecuTorch is BSD licensed, as found in the LICENSE file.

ExecuTorch is BSD licensed, as found in the [LICENSE](LICENSE) file.

<br><br>

---

<div align="center">
<p><strong>Part of the PyTorch ecosystem</strong></p>
<p>
<a href="https://github.com/pytorch/executorch">GitHub</a> •
<a href="https://docs.pytorch.org/executorch">Documentation</a>
</p>
</div>
Loading