|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +ExecuTorch is an end-to-end solution for on-device inference powering AI experiences across Meta's products. It provides a portable runtime for executing PyTorch models on resource-constrained devices including mobile phones, embedded systems, and microcontrollers. |
| 8 | + |
| 9 | +## Key Development Commands |
| 10 | + |
| 11 | +### Build Commands |
| 12 | + |
| 13 | +```bash |
| 14 | +# Configure CMake build (one-time setup) |
| 15 | +rm -rf cmake-out && mkdir cmake-out && cd cmake-out && cmake .. |
| 16 | + |
| 17 | +# Build the project (use core count + 1 for -j value) |
| 18 | +cmake --build cmake-out -j9 |
| 19 | + |
| 20 | +# Build with common extensions enabled |
| 21 | +cmake . \ |
| 22 | + -DEXECUTORCH_BUILD_XNNPACK=ON \ |
| 23 | + -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \ |
| 24 | + -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \ |
| 25 | + -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \ |
| 26 | + -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \ |
| 27 | + -Bcmake-out |
| 28 | +``` |
| 29 | + |
| 30 | +### Testing Commands |
| 31 | + |
| 32 | +```bash |
| 33 | +# Run all C++ tests |
| 34 | +./test/run_oss_cpp_tests.sh |
| 35 | + |
| 36 | +# Run Python tests |
| 37 | +pytest |
| 38 | + |
| 39 | +# Run specific test directory |
| 40 | +./test/run_oss_cpp_tests.sh runtime/core/test/ |
| 41 | + |
| 42 | +# Build and run size tests |
| 43 | +./test/build_size_test.sh |
| 44 | +``` |
| 45 | + |
| 46 | +### Code Quality |
| 47 | + |
| 48 | +```bash |
| 49 | +# Set up lintrunner |
| 50 | +pip install lintrunner==0.12.7 lintrunner-adapters==0.12.4 |
| 51 | +lintrunner init |
| 52 | + |
| 53 | +# Run linter |
| 54 | +lintrunner |
| 55 | + |
| 56 | +# Auto-fix linting issues |
| 57 | +lintrunner -a |
| 58 | + |
| 59 | +# Run specific linters |
| 60 | +lintrunner --take CLANGFORMAT |
| 61 | +``` |
| 62 | + |
| 63 | +### Python Environment Setup |
| 64 | + |
| 65 | +```bash |
| 66 | +# Install requirements |
| 67 | +./install_requirements.sh |
| 68 | + |
| 69 | +# Install in development mode |
| 70 | +pip install -e . |
| 71 | +``` |
| 72 | + |
| 73 | +## Architecture Overview |
| 74 | + |
| 75 | +### Core Components |
| 76 | + |
| 77 | +**Export Pipeline (exir/)** |
| 78 | +- Captures PyTorch models via torch.export |
| 79 | +- Lowers models through EXIR dialects (ATen → Edge → Backend/Core ATen) |
| 80 | +- Performs optimizations like memory planning and operator fusion |
| 81 | +- Serializes to .pte (Program Transfer Entity) format |
| 82 | + |
| 83 | +**Runtime (runtime/)** |
| 84 | +- Loads and executes .pte files |
| 85 | +- Core structures: Tensor, EValue, Program, Method |
| 86 | +- Memory management with static allocation |
| 87 | +- Delegate interface for hardware acceleration |
| 88 | + |
| 89 | +**Kernels (kernels/)** |
| 90 | +- `portable/`: Reference C++ implementations for all operators |
| 91 | +- `optimized/`: SIMD-optimized implementations |
| 92 | +- `quantized/`: Quantized operator implementations |
| 93 | +- Operators registered via static initialization |
| 94 | + |
| 95 | +**Backends (backends/)** |
| 96 | +- Hardware-specific accelerators (Apple CoreML/MPS, ARM, Qualcomm, etc.) |
| 97 | +- Each backend provides: |
| 98 | + - Partitioner: Identifies subgraphs for delegation |
| 99 | + - Preprocess: Converts subgraphs to backend format |
| 100 | + - Runtime: Executes delegated operations |
| 101 | + |
| 102 | +### Key Design Patterns |
| 103 | + |
| 104 | +**Memory Management** |
| 105 | +- No dynamic allocation in core runtime |
| 106 | +- Static memory planning at export time |
| 107 | +- MemoryAllocator interface for custom allocation strategies |
| 108 | + |
| 109 | +**Error Handling** |
| 110 | +- Result<T, Error> type for fallible operations |
| 111 | +- No exceptions (disabled for portability) |
| 112 | +- ET_CHECK macros for assertions |
| 113 | + |
| 114 | +**Backend Delegation** |
| 115 | +- Subgraphs marked with delegation tags during export |
| 116 | +- Backend runtime loaded dynamically |
| 117 | +- Fallback to CPU execution if delegation fails |
| 118 | + |
| 119 | +### Model Export Flow |
| 120 | + |
| 121 | +1. Capture PyTorch model with torch.export |
| 122 | +2. Lower through EXIR passes (quantization, delegation, optimization) |
| 123 | +3. Run memory planning to determine buffer sizes |
| 124 | +4. Emit to flatbuffer format (.pte file) |
| 125 | +5. Load in runtime and execute |
| 126 | + |
| 127 | +### Extension Modules |
| 128 | + |
| 129 | +**Module (extension/module/)** |
| 130 | +- Simplified C++ API for loading and running models |
| 131 | +- Handles method execution and tensor I/O |
| 132 | + |
| 133 | +**LLM Support (extension/llm/)** |
| 134 | +- Specialized runtime for large language models |
| 135 | +- Custom operators for KV caching, rotary embeddings |
| 136 | +- Token sampling strategies |
| 137 | + |
| 138 | +**Platform Bindings** |
| 139 | +- Android: JNI wrappers in extension/android/ |
| 140 | +- iOS: Objective-C++ wrappers in extension/apple/ |
| 141 | +- Python: pybind11 bindings in extension/pybindings/ |
| 142 | + |
| 143 | +## Working with Models |
| 144 | + |
| 145 | +Example models are in `examples/models/`. Each model directory typically contains: |
| 146 | +- `export_<model>.py`: Exports PyTorch model to .pte |
| 147 | +- `model.py`: Model definition |
| 148 | +- `runner/`: C++ runner implementation |
| 149 | +- `README.md`: Model-specific instructions |
| 150 | + |
| 151 | +Common workflow: |
| 152 | +1. Export model: `python export_llama.py` |
| 153 | +2. Build runner: `cmake --build cmake-out --target llama_runner` |
| 154 | +3. Run inference: `./cmake-out/examples/models/llama/llama_runner --model_path=model.pte` |
| 155 | + |
| 156 | +## Code Style Guidelines |
| 157 | + |
| 158 | +- C++ follows Google style with modifications: |
| 159 | + - Functions use `lower_snake_case()` |
| 160 | + - Files use `lower_snake_case.cpp` |
| 161 | + - Headers use `#pragma once` |
| 162 | + - All includes use `<angle_brackets>` |
| 163 | +- Python follows PyTorch style |
| 164 | +- No dynamic memory allocation in runtime |
| 165 | +- Avoid templates to minimize binary size |
| 166 | +- Document public APIs with Doxygen comments |
0 commit comments