We're excited to announce the release of ExecuTorch 0.7! This release builds upon the foundation established in the 0.7 release with significant improvements across various components of the framework. ExecuTorch continues to be a powerful solution for on-device AI, powering experiences across a wide range of platforms.
Highlights
- New export_llm API: Unified CLI and API to export supported LLM models, replacing the old export_llama script
- Generic Text LLM Runner: Support for all text-only decoder-only LLM models with full Android & iOS integration
- Program-Data Separation: Experimental APIs to export program and data files separately with new '.ptd' file extension
- KleidiAI Integration: Now enabled by default in XNNPACK backend for optimized low-bit matrix multiplication
- LLM Model Support: Added support for Qwen3, SmolLM3, and Gemma3 models
- Developer Tools: New numerical debugging capabilities and inspector APIs
API Changes
- New export_llm API, repackaged from the old export_llama script, which provides a unified CLI and API to custom export supported LLM models
- New generic text LLM runner that is able to support all text-only decoder-only LLM models. Fully integrated with Android & iOS demo apps.
- Experimental program-data separation APIs to export program and data files separately. Introduce the new ‘.ptd’ file extension for data-only files. Support for portable operators and the XNNPACK backend.
- Added a new runtime API for PAL overrides and log routing.
Deprecated
- Constant buffer will be deprecated in the runtime in coming releases. Please re-export any PTE files generated before ExecuTorch 0.4.
Build
- CMake
- Model API for data type selective build available.
- Bumped minimum CMake requirement to 3.29
- Pip Wheel Build: Various improvements to wheel building process and CI infrastructure (link1, link2, link3), XNNPACK enabled by default for wheels
- Build Presets: Introduced CMake presets for common build configurations (iOS, macOS, Linux, pybind, LLM)
Backend Delegates
Arm
- Cortex-M Support: Initial commit for Cortex-M backend with scalar C++ ops for quantize/dequantize operations
- TOSA 1.0 Support: Updated to support TOSA 1.0 specification with improved operator coverage
- New Operators: Added
alias_copy
,arange.default
,aten.mul.Tensor (int32 BI/INT)
,aten.ne
,atan
,BatchNorm2D
,cosine_similarity
,embedding.default
,eq.Scalar
,erf
,ge.Scalar
,gelu
,GroupNorm
,grouped_convolution
,gt.Scalar
,index.Tensor
,index_select
,linalg_vector_norm (DecomposeLinalgVectorNorm)
,lt.Scalar
,neg.default
,aten.round
,scaled_dot_product_attention (SDPA)
,sinh
,sqrt
,upsample_bilinear2d
,where.self
- Quantization: Enhanced per-channel quantization support and QAT (Quantization Aware Training) improvements
- Performance: Improved pooling operations, broadcasting, and memory allocation strategies
- 5D Tensor Support: Extended backend to handle 5D tensors for more complex models
CoreML
- Model Support: Enhanced transformer support with RoPE and RMSNorm original definitions
- Cross-platform: Improved CoreML export support on Linux platforms
- Documentation: Updated documentation with corrected file names and commands
Qualcomm
- AI Engine Direct: GA enablement for multiple models including Albert, Bert, DistilBert, Eurobert, DeiT, EfficientNet, MobileViT, PVT, CvT, DIT, FocalNet, Swin Transformer, and Roberta
- Multi-method Support: Enhanced support for multiple methods in QNN backend
- Lookahead Decoding: Enabled lookahead decoding capabilities
- Custom Operators: Support for custom operator integration
- QAIRT Visualizer: Integration with QAIRT visualization tools
MediaTek
- CI Integration: Added MediaTek backend CI support
- Buffer Allocator: Decoupled Neuron buffer allocator from ExecuTorch framework
- Documentation: Updated documents for Express SDK and buffer allocator usage
- Platform Config: Introduced platform-config in CompileSpec for better hardware targeting
MPS
- Documentation: Updated MPS documentation with correct file names and commands
- Example Fixes: Fixed MPS example for non-LLM models
NXP
- Added initial version of eIQ Neutron Backend and runtime
- Support for basic NN operators
- Quantization and conversion support for MobileNets and convolution-based networks
- Complete end-to-end support for MobileNetV2 and CifarNet with examples
Vulkan
- Operator Support: New implementations for
native_group_norm
,permute
,var.dim
,tan
, quantization/dequantization ops - Performance Optimizations: Improved conv2d operations with specialized shaders and reduced memory usage
- Dynamic Shape Support: Enhanced support for dynamic shapes and symbolic operations
- Testing Infrastructure: Improved operator test codegen system and validation
XNNPACK
- KleidiAI is now enabled by default on the XNNPACK Backend! This provides optimized low-bit matrix multiplication kernels that boost Llama prefill and decode performance
- New SDOT kernels by KleidiAI are introduced for platforms lacking the i8mm extension, ensuring improved compatibility and performance for quantized models on a wider range of devices.
- Weight-sharing support between different methods within the same PTE file.
- Program-data separation support. Export and run an XNNPACK-lowered model with program and data in separate files.
Android / iOS
- LLM Integration: Enhanced Android demo app with generic LLM runner support
- Java APIs: New runtime APIs for registered ops and backends
- Dynamic Image Loading: Add support for loading images from /sdcard and /data partitions, with user-granted permissions for image segmentation use cases. See executor-examples proj Deeplab V3 Demo for latest updates
- Logging: Added Android log implementation for better debugging
- Build Improvements: Updated Android CMake configuration and artifact generation
Devtools
- Numerical Debugging: New
calculate_numeric_gap
API to compare logged operator-level intermediate outputs from exported graph (ETRecord) with runtime outputs (ETDump) - Inspector Enhancements: Integrated debug handle to operator name mapping and intermediate output capturing
- Numerical Comparators: Added L1, MSE, and SNR numerical comparators for model validation
- ETDump Integration: Enhanced ETDump support in module APIs
Model Support
- Enabled Qwen3 (0.6B, 1.7B, 4B), SmolLM3, and Gemma3 1B with high out-of-box performance
- LLaVA: Improved multimodal model support with enhanced image and text processing
- Quantization: Enhanced quantization support across various models with improved precision and performance
Ops and kernels
- Portable Kernels: Added
randn
andrand
kernel implementations,native_dropout
support - Scalar Overflow Protection: Enhanced scalar overflow checking across multiple operators
- Vectorization: Improved vectorized math operations and elementwise utilities
- Memory Operations: Enhanced copy utilities and memory planning algorithms
- Broadcasting: Improved broadcast support for various binary operations
- Complex Number Support: Added complex dtype support for
mul
,sum
,bmm
, anddiagonal_copy
operations - New Operators: Added support for
leaky_relu
,hardtanh
,amax
,view_as_real_copy
, and FFT operations - Optimized Implementations: Enhanced ELU implementation and improved elementwise utilities
Contributors
@abeakkas @abhinaykukkadapu @andreanicastro @anijain2305 @jainapurva @desertfire @BujSet @chenweng-quic @Conarnar @ethansfng @fumchin @HDCharles @IshanAryendu @Juntian777 @skywall @mchien512 @leafs1 @michaelmaitland @NeelakshSharma @nil-is-all @nitish2112 @Burton2000 @rmaz @rascani @robert-kalmar @rohansjoshi @psiddh @suvadeep89 @tamird @tbergkvist @phaiting @tobbeebbot @wl1026sun @ynimmaga @ydwu4 @cyyever @keyprocedure
Full Changelog: v0.6.0...v0.7.0