Skip to content

v0.7.0

Latest
Compare
Choose a tag to compare
@larryliu0820 larryliu0820 released this 11 Aug 23:46
· 702 commits to main since this release
4b9d442

We're excited to announce the release of ExecuTorch 0.7! This release builds upon the foundation established in the 0.7 release with significant improvements across various components of the framework. ExecuTorch continues to be a powerful solution for on-device AI, powering experiences across a wide range of platforms.

Highlights

  • New export_llm API: Unified CLI and API to export supported LLM models, replacing the old export_llama script
  • Generic Text LLM Runner: Support for all text-only decoder-only LLM models with full Android & iOS integration
  • Program-Data Separation: Experimental APIs to export program and data files separately with new '.ptd' file extension
  • KleidiAI Integration: Now enabled by default in XNNPACK backend for optimized low-bit matrix multiplication
  • LLM Model Support: Added support for Qwen3, SmolLM3, and Gemma3 models
  • Developer Tools: New numerical debugging capabilities and inspector APIs

API Changes

  • New export_llm API, repackaged from the old export_llama script, which provides a unified CLI and API to custom export supported LLM models
  • New generic text LLM runner that is able to support all text-only decoder-only LLM models. Fully integrated with Android & iOS demo apps.
  • Experimental program-data separation APIs to export program and data files separately. Introduce the new ‘.ptd’ file extension for data-only files. Support for portable operators and the XNNPACK backend.
  • Added a new runtime API for PAL overrides and log routing.

Deprecated

Build

  • CMake
    • Model API for data type selective build available.
    • Bumped minimum CMake requirement to 3.29
  • Pip Wheel Build: Various improvements to wheel building process and CI infrastructure (link1, link2, link3), XNNPACK enabled by default for wheels
  • Build Presets: Introduced CMake presets for common build configurations (iOS, macOS, Linux, pybind, LLM)

Backend Delegates

Arm

  • Cortex-M Support: Initial commit for Cortex-M backend with scalar C++ ops for quantize/dequantize operations
  • TOSA 1.0 Support: Updated to support TOSA 1.0 specification with improved operator coverage
  • New Operators: Added alias_copy, arange.default, aten.mul.Tensor (int32 BI/INT), aten.ne, atan, BatchNorm2D, cosine_similarity, embedding.default, eq.Scalar, erf, ge.Scalar, gelu, GroupNorm, grouped_convolution, gt.Scalar, index.Tensor, index_select, linalg_vector_norm (DecomposeLinalgVectorNorm), lt.Scalar, neg.default, aten.round, scaled_dot_product_attention (SDPA), sinh, sqrt, upsample_bilinear2d, where.self
  • Quantization: Enhanced per-channel quantization support and QAT (Quantization Aware Training) improvements
  • Performance: Improved pooling operations, broadcasting, and memory allocation strategies
  • 5D Tensor Support: Extended backend to handle 5D tensors for more complex models

CoreML

  • Model Support: Enhanced transformer support with RoPE and RMSNorm original definitions
  • Cross-platform: Improved CoreML export support on Linux platforms
  • Documentation: Updated documentation with corrected file names and commands

Qualcomm

  • AI Engine Direct: GA enablement for multiple models including Albert, Bert, DistilBert, Eurobert, DeiT, EfficientNet, MobileViT, PVT, CvT, DIT, FocalNet, Swin Transformer, and Roberta
  • Multi-method Support: Enhanced support for multiple methods in QNN backend
  • Lookahead Decoding: Enabled lookahead decoding capabilities
  • Custom Operators: Support for custom operator integration
  • QAIRT Visualizer: Integration with QAIRT visualization tools

MediaTek

  • CI Integration: Added MediaTek backend CI support
  • Buffer Allocator: Decoupled Neuron buffer allocator from ExecuTorch framework
  • Documentation: Updated documents for Express SDK and buffer allocator usage
  • Platform Config: Introduced platform-config in CompileSpec for better hardware targeting

MPS

  • Documentation: Updated MPS documentation with correct file names and commands
  • Example Fixes: Fixed MPS example for non-LLM models

NXP

  • Added initial version of eIQ Neutron Backend and runtime
  • Support for basic NN operators
  • Quantization and conversion support for MobileNets and convolution-based networks
  • Complete end-to-end support for MobileNetV2 and CifarNet with examples

Vulkan

  • Operator Support: New implementations for native_group_norm, permute, var.dim, tan, quantization/dequantization ops
  • Performance Optimizations: Improved conv2d operations with specialized shaders and reduced memory usage
  • Dynamic Shape Support: Enhanced support for dynamic shapes and symbolic operations
  • Testing Infrastructure: Improved operator test codegen system and validation

XNNPACK

  • KleidiAI is now enabled by default on the XNNPACK Backend! This provides optimized low-bit matrix multiplication kernels that boost Llama prefill and decode performance
  • New SDOT kernels by KleidiAI are introduced for platforms lacking the i8mm extension, ensuring improved compatibility and performance for quantized models on a wider range of devices.
  • Weight-sharing support between different methods within the same PTE file.
  • Program-data separation support. Export and run an XNNPACK-lowered model with program and data in separate files.

Android / iOS

  • LLM Integration: Enhanced Android demo app with generic LLM runner support
  • Java APIs: New runtime APIs for registered ops and backends
  • Dynamic Image Loading: Add support for loading images from /sdcard and /data partitions, with user-granted permissions for image segmentation use cases. See executor-examples proj Deeplab V3 Demo for latest updates
  • Logging: Added Android log implementation for better debugging
  • Build Improvements: Updated Android CMake configuration and artifact generation

Devtools

  • Numerical Debugging: New calculate_numeric_gap API to compare logged operator-level intermediate outputs from exported graph (ETRecord) with runtime outputs (ETDump)
  • Inspector Enhancements: Integrated debug handle to operator name mapping and intermediate output capturing
  • Numerical Comparators: Added L1, MSE, and SNR numerical comparators for model validation
  • ETDump Integration: Enhanced ETDump support in module APIs

Model Support

  • Enabled Qwen3 (0.6B, 1.7B, 4B), SmolLM3, and Gemma3 1B with high out-of-box performance
  • LLaVA: Improved multimodal model support with enhanced image and text processing
  • Quantization: Enhanced quantization support across various models with improved precision and performance

Ops and kernels

  • Portable Kernels: Added randn and rand kernel implementations, native_dropout support
  • Scalar Overflow Protection: Enhanced scalar overflow checking across multiple operators
  • Vectorization: Improved vectorized math operations and elementwise utilities
  • Memory Operations: Enhanced copy utilities and memory planning algorithms
  • Broadcasting: Improved broadcast support for various binary operations
  • Complex Number Support: Added complex dtype support for mul, sum, bmm, and diagonal_copy operations
  • New Operators: Added support for leaky_relu, hardtanh, amax, view_as_real_copy, and FFT operations
  • Optimized Implementations: Enhanced ELU implementation and improved elementwise utilities

Contributors

@abeakkas @abhinaykukkadapu @andreanicastro @anijain2305 @jainapurva @desertfire @BujSet @chenweng-quic @Conarnar @ethansfng @fumchin @HDCharles @IshanAryendu @Juntian777 @skywall @mchien512 @leafs1 @michaelmaitland @NeelakshSharma @nil-is-all @nitish2112 @Burton2000 @rmaz @rascani @robert-kalmar @rohansjoshi @psiddh @suvadeep89 @tamird @tbergkvist @phaiting @tobbeebbot @wl1026sun @ynimmaga @ydwu4 @cyyever @keyprocedure

Full Changelog: v0.6.0...v0.7.0