We're excited to announce the release of ExecuTorch 0.7! This release builds upon the foundation established in the 0.7 release with significant improvements across various components of the framework. ExecuTorch continues to be a powerful solution for on-device AI, powering experiences across a wide range of platforms.

Highlights

New export_llm API: Unified CLI and API to export supported LLM models, replacing the old export_llama script
Generic Text LLM Runner: Support for all text-only decoder-only LLM models with full Android & iOS integration
Program-Data Separation: Experimental APIs to export program and data files separately with new '.ptd' file extension
KleidiAI Integration: Now enabled by default in XNNPACK backend for optimized low-bit matrix multiplication
LLM Model Support: Added support for Qwen3, SmolLM3, and Gemma3 models
Developer Tools: New numerical debugging capabilities and inspector APIs

API Changes

New export_llm API, repackaged from the old export_llama script, which provides a unified CLI and API to custom export supported LLM models
New generic text LLM runner that is able to support all text-only decoder-only LLM models. Fully integrated with Android & iOS demo apps.
Experimental program-data separation APIs to export program and data files separately. Introduce the new ‘.ptd’ file extension for data-only files. Support for portable operators and the XNNPACK backend.
Added a new runtime API for PAL overrides and log routing.

Deprecated

Constant buffer will be deprecated in the runtime in coming releases. Please re-export any PTE files generated before ExecuTorch 0.4.

Build

CMake
- Model API for data type selective build available.
- Bumped minimum CMake requirement to 3.29
Pip Wheel Build: Various improvements to wheel building process and CI infrastructure (link1, link2, link3), XNNPACK enabled by default for wheels
Build Presets: Introduced CMake presets for common build configurations (iOS, macOS, Linux, pybind, LLM)

Backend Delegates

Arm

Cortex-M Support: Initial commit for Cortex-M backend with scalar C++ ops for quantize/dequantize operations
TOSA 1.0 Support: Updated to support TOSA 1.0 specification with improved operator coverage
New Operators: Added alias_copy, arange.default, aten.mul.Tensor (int32 BI/INT), aten.ne, atan, BatchNorm2D, cosine_similarity, embedding.default, eq.Scalar, erf, ge.Scalar, gelu, GroupNorm, grouped_convolution, gt.Scalar, index.Tensor, index_select, linalg_vector_norm (DecomposeLinalgVectorNorm), lt.Scalar, neg.default, aten.round, scaled_dot_product_attention (SDPA), sinh, sqrt, upsample_bilinear2d, where.self
Quantization: Enhanced per-channel quantization support and QAT (Quantization Aware Training) improvements
Performance: Improved pooling operations, broadcasting, and memory allocation strategies
5D Tensor Support: Extended backend to handle 5D tensors for more complex models

CoreML

Model Support: Enhanced transformer support with RoPE and RMSNorm original definitions
Cross-platform: Improved CoreML export support on Linux platforms
Documentation: Updated documentation with corrected file names and commands

Qualcomm

AI Engine Direct: GA enablement for multiple models including Albert, Bert, DistilBert, Eurobert, DeiT, EfficientNet, MobileViT, PVT, CvT, DIT, FocalNet, Swin Transformer, and Roberta
Multi-method Support: Enhanced support for multiple methods in QNN backend
Lookahead Decoding: Enabled lookahead decoding capabilities
Custom Operators: Support for custom operator integration
QAIRT Visualizer: Integration with QAIRT visualization tools

MediaTek

CI Integration: Added MediaTek backend CI support
Buffer Allocator: Decoupled Neuron buffer allocator from ExecuTorch framework
Documentation: Updated documents for Express SDK and buffer allocator usage
Platform Config: Introduced platform-config in CompileSpec for better hardware targeting

MPS

Documentation: Updated MPS documentation with correct file names and commands
Example Fixes: Fixed MPS example for non-LLM models

NXP

Added initial version of eIQ Neutron Backend and runtime
Support for basic NN operators
Quantization and conversion support for MobileNets and convolution-based networks
Complete end-to-end support for MobileNetV2 and CifarNet with examples

Vulkan

Operator Support: New implementations for native_group_norm, permute, var.dim, tan, quantization/dequantization ops
Performance Optimizations: Improved conv2d operations with specialized shaders and reduced memory usage
Dynamic Shape Support: Enhanced support for dynamic shapes and symbolic operations
Testing Infrastructure: Improved operator test codegen system and validation

XNNPACK

KleidiAI is now enabled by default on the XNNPACK Backend! This provides optimized low-bit matrix multiplication kernels that boost Llama prefill and decode performance
New SDOT kernels by KleidiAI are introduced for platforms lacking the i8mm extension, ensuring improved compatibility and performance for quantized models on a wider range of devices.
Weight-sharing support between different methods within the same PTE file.
Program-data separation support. Export and run an XNNPACK-lowered model with program and data in separate files.

Android / iOS

LLM Integration: Enhanced Android demo app with generic LLM runner support
Java APIs: New runtime APIs for registered ops and backends
Dynamic Image Loading: Add support for loading images from /sdcard and /data partitions, with user-granted permissions for image segmentation use cases. See executor-examples proj Deeplab V3 Demo for latest updates
Logging: Added Android log implementation for better debugging
Build Improvements: Updated Android CMake configuration and artifact generation

Devtools

Numerical Debugging: New calculate_numeric_gap API to compare logged operator-level intermediate outputs from exported graph (ETRecord) with runtime outputs (ETDump)
Inspector Enhancements: Integrated debug handle to operator name mapping and intermediate output capturing
Numerical Comparators: Added L1, MSE, and SNR numerical comparators for model validation
ETDump Integration: Enhanced ETDump support in module APIs

Model Support

Enabled Qwen3 (0.6B, 1.7B, 4B), SmolLM3, and Gemma3 1B with high out-of-box performance
LLaVA: Improved multimodal model support with enhanced image and text processing
Quantization: Enhanced quantization support across various models with improved precision and performance

Ops and kernels

Portable Kernels: Added randn and rand kernel implementations, native_dropout support
Scalar Overflow Protection: Enhanced scalar overflow checking across multiple operators
Vectorization: Improved vectorized math operations and elementwise utilities
Memory Operations: Enhanced copy utilities and memory planning algorithms
Broadcasting: Improved broadcast support for various binary operations
Complex Number Support: Added complex dtype support for mul, sum, bmm, and diagonal_copy operations
New Operators: Added support for leaky_relu, hardtanh, amax, view_as_real_copy, and FFT operations
Optimized Implementations: Enhanced ELU implementation and improved elementwise utilities

Contributors

@abeakkas @abhinaykukkadapu @andreanicastro @anijain2305 @jainapurva @desertfire @BujSet @chenweng-quic @Conarnar @ethansfng @fumchin @HDCharles @IshanAryendu @Juntian777 @skywall @mchien512 @leafs1 @michaelmaitland @NeelakshSharma @nil-is-all @nitish2112 @Burton2000 @rmaz @rascani @robert-kalmar @rohansjoshi @psiddh @suvadeep89 @tamird @tbergkvist @phaiting @tobbeebbot @wl1026sun @ynimmaga @ydwu4 @cyyever @keyprocedure

Full Changelog: v0.6.0...v0.7.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.7.0

Highlights

API Changes

Deprecated

Build

Backend Delegates

Arm

CoreML

Qualcomm

MediaTek

MPS

NXP

Vulkan

XNNPACK

Android / iOS

Devtools

Model Support

Ops and kernels

Contributors

Contributors

Uh oh!