|
1 | | -# Experimental: PyTorch Unified Python-less Solution |
| 1 | +# ExecuTorch: Inference on consumer Desktops/Laptops with GPUs |
2 | 2 |
|
3 | | -This folder contains the experimental PyTorch Unified Python-less Solution, for both compiler and runtime. Proceed with caution. |
| 3 | +## Overview |
4 | 4 |
|
| 5 | +ExecuTorch is a lightweight, flexible runtime designed for efficient AI inference, historically focused on mobile and embedded devices. With the growing demand for local inference on personal desktops and laptops—especially those equipped with consumer GPUs (e.g., gaming PCs with NVIDIA hardware)—ExecuTorch is experimenting on expanding its capabilities to support these platforms. |
5 | 6 |
|
6 | | -## torch dependency |
7 | | -We use the pinned pytorch version from `install_requirements.py` and CI should be using `.ci/docker/ci_commit_pins/pytorch.txt` which should be consistent with `install_requirements.py`. |
| 7 | +## Historical Context |
| 8 | +- **Mobile and Embedded Focus**: ExecuTorch’s initial target market was mobile and embedded devices. |
| 9 | +- **Desktop/Laptop Support**: Previously, desktop and laptop ("AI PC") inference was enabled through backends such as XNNPACK, OpenVino, and Qualcomm NPUs. |
| 10 | +- **No CUDA Support**: For a long time, ExecuTorch did not offer a CUDA backend, limiting GPU acceleration on NVIDIA hardware. |
8 | 11 |
|
| 12 | +## Recent Developments |
| 13 | +With increased demand for local inference on consumer desktops and laptops, exemplified by popular runtimes like llama.cpp and MLX, ExecuTorch is now experimenting with CUDA and Metal support. This is achieved by leveraging Inductor compiler technology from PyTorch, specifically using Ahead-of-Time Inductor [AOTI](https://docs.pytorch.org/docs/stable/torch.compiler_aot_inductor.html) to avoid reinventing the wheel. |
9 | 14 |
|
10 | | -## Compiler |
11 | | -All code should live in `compiler/` folder. Code uses `torch` nightly as mentioned in torch dependency section. |
| 15 | +## Key Benefits |
| 16 | +- **Model Agnostic**: Validated on models such as [Voxtral](../examples/models/voxtral), [Gemma3-4b](../examples/models/gemma3), ResNet, and Whisper (WIP). Theoretically, any model exportable via torch.export is supported. |
| 17 | +- **PyTorch Ecosystem Integration**: Enables workflows for fine-tuning, quantization, and compilation within the PyTorch ecosystem. |
| 18 | +- **No Python Runtime During Inference**: Ideal for native applications (e.g., written in C++) embedding AI capabilities. |
| 19 | +- **No libtorch Dependency**: Reduces binary size, making deployment easier for resource-constrained applications. |
| 20 | +- **Efficient GPU Support**: Uses AOTI-powered CUDA backend for efficient inference on NVIDIA GPUs. |
12 | 21 |
|
13 | | -## Runtime |
14 | | -All code should live in `runtime/` folder. CMake build system should leverage `libtorch` in the pip install of `torch` nightly. To build runtime, we need to point `CMAKE_PREFIX_PATH` to the pip install location of `torch` nightly. This way we can do: |
| 22 | +## Backends |
15 | 23 |
|
16 | | -```cmake |
17 | | -find_package(torch REQUIRED) |
18 | | -``` |
| 24 | +Backends leveraging AoTi |
| 25 | +- [CUDA backend](../backends/cuda) |
| 26 | +- [Metal backend](../backends/apple/metal) |
| 27 | + |
| 28 | +## Roadmap & Limitations |
| 29 | +- **Experimental Status**: CUDA and Metal backends via AoTi are currently experimental. Contributions and feedback are welcome! |
| 30 | +- **Model Compatibility**: While most models exportable via torch.export should work, validation is ongoing for broader model support. |
| 31 | +- **Portability**: Figuring out the balance and trade-off between performance, portability and model filesize. |
| 32 | +- **Windows-native WIP**: On windows we only supports WSL right now. Native Windows support is WIP. |
0 commit comments