Skip to content

Commit 259f1e6

Browse files
authored
Add desktop README (#15347)
1 parent 82611e9 commit 259f1e6

File tree

1 file changed

+25
-11
lines changed

1 file changed

+25
-11
lines changed

desktop/README.md

Lines changed: 25 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,32 @@
1-
# Experimental: PyTorch Unified Python-less Solution
1+
# ExecuTorch: Inference on consumer Desktops/Laptops with GPUs
22

3-
This folder contains the experimental PyTorch Unified Python-less Solution, for both compiler and runtime. Proceed with caution.
3+
## Overview
44

5+
ExecuTorch is a lightweight, flexible runtime designed for efficient AI inference, historically focused on mobile and embedded devices. With the growing demand for local inference on personal desktops and laptops—especially those equipped with consumer GPUs (e.g., gaming PCs with NVIDIA hardware)—ExecuTorch is experimenting on expanding its capabilities to support these platforms.
56

6-
## torch dependency
7-
We use the pinned pytorch version from `install_requirements.py` and CI should be using `.ci/docker/ci_commit_pins/pytorch.txt` which should be consistent with `install_requirements.py`.
7+
## Historical Context
8+
- **Mobile and Embedded Focus**: ExecuTorch’s initial target market was mobile and embedded devices.
9+
- **Desktop/Laptop Support**: Previously, desktop and laptop ("AI PC") inference was enabled through backends such as XNNPACK, OpenVino, and Qualcomm NPUs.
10+
- **No CUDA Support**: For a long time, ExecuTorch did not offer a CUDA backend, limiting GPU acceleration on NVIDIA hardware.
811

12+
## Recent Developments
13+
With increased demand for local inference on consumer desktops and laptops, exemplified by popular runtimes like llama.cpp and MLX, ExecuTorch is now experimenting with CUDA and Metal support. This is achieved by leveraging Inductor compiler technology from PyTorch, specifically using Ahead-of-Time Inductor [AOTI](https://docs.pytorch.org/docs/stable/torch.compiler_aot_inductor.html) to avoid reinventing the wheel.
914

10-
## Compiler
11-
All code should live in `compiler/` folder. Code uses `torch` nightly as mentioned in torch dependency section.
15+
## Key Benefits
16+
- **Model Agnostic**: Validated on models such as [Voxtral](../examples/models/voxtral), [Gemma3-4b](../examples/models/gemma3), ResNet, and Whisper (WIP). Theoretically, any model exportable via torch.export is supported.
17+
- **PyTorch Ecosystem Integration**: Enables workflows for fine-tuning, quantization, and compilation within the PyTorch ecosystem.
18+
- **No Python Runtime During Inference**: Ideal for native applications (e.g., written in C++) embedding AI capabilities.
19+
- **No libtorch Dependency**: Reduces binary size, making deployment easier for resource-constrained applications.
20+
- **Efficient GPU Support**: Uses AOTI-powered CUDA backend for efficient inference on NVIDIA GPUs.
1221

13-
## Runtime
14-
All code should live in `runtime/` folder. CMake build system should leverage `libtorch` in the pip install of `torch` nightly. To build runtime, we need to point `CMAKE_PREFIX_PATH` to the pip install location of `torch` nightly. This way we can do:
22+
## Backends
1523

16-
```cmake
17-
find_package(torch REQUIRED)
18-
```
24+
Backends leveraging AoTi
25+
- [CUDA backend](../backends/cuda)
26+
- [Metal backend](../backends/apple/metal)
27+
28+
## Roadmap & Limitations
29+
- **Experimental Status**: CUDA and Metal backends via AoTi are currently experimental. Contributions and feedback are welcome!
30+
- **Model Compatibility**: While most models exportable via torch.export should work, validation is ongoing for broader model support.
31+
- **Portability**: Figuring out the balance and trade-off between performance, portability and model filesize.
32+
- **Windows-native WIP**: On windows we only supports WSL right now. Native Windows support is WIP.

0 commit comments

Comments
 (0)