From efb55bb98064b3c65d42f97df81a5245ffc4bb1f Mon Sep 17 00:00:00 2001 From: Mergen Nachin Date: Tue, 14 Oct 2025 17:16:49 -0400 Subject: [PATCH] New landing page prototype --- website/index.html | 1753 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1753 insertions(+) create mode 100644 website/index.html diff --git a/website/index.html b/website/index.html new file mode 100644 index 00000000000..1235f4cf215 --- /dev/null +++ b/website/index.html @@ -0,0 +1,1753 @@ + + + + + + ExecuTorch - PyTorch's On-Device AI Framework + + + + + + + + +
+
+

+ + ExecuTorch +

+

Deploy PyTorch models directly to edge devices. Text, vision, and audio AI with privacy-preserving, real-time inferenceβ€”no cloud required.

+ +
+
+
12+
+
hardware backends supported
+
+
+
Billions
+
users in production at Meta
+
+
+
50KB
+
base runtime footprint
+
+
+ +
+ Get Started + View on GitHub +
+
+
+ + +
+
+

Why On-Device AI Matters

+

The future of AI is at the edge, where privacy meets performance

+ +
+
+
πŸ”’
+

Enhanced Privacy

+

Data never leaves the device. Process personal content, conversations, and media locally without cloud exposure.

+
+ +
+
⚑
+

Real-Time Response

+

Instant inference with no network round-trips. Perfect for AR/VR experiences, multimodal AI interactions, and responsive conversational agents.

+
+ +
+
🌐
+

Offline & Low-Bandwidth Ready

+

Zero network dependency for inference. Works seamlessly in low-bandwidth regions, remote areas, or completely offline.

+
+ +
+
πŸ’°
+

Cost Efficient

+

No cloud compute bills. No API rate limits. Scale to billions of users without infrastructure costs growing linearly.

+
+
+
+
+ + +
+
+

Models Are Getting Smaller & Smarter

+

The convergence of efficient architectures and edge hardware creates new opportunities

+ +
+
+
Dramatically Smaller
+
Modern LLMs achieve high quality at a fraction of historical sizes
+
+
+
Edge-Ready Performance
+
Real-time inference on consumer smartphones
+
+
+
Quantization Benefits
+
Significant size reduction while preserving accuracy
+
+
+ +

+ The opportunity is now: Foundation models have crossed the efficiency threshold. + Deploy sophisticated AI directly where data lives. +

+
+
+ + +
+
+

Why On-Device AI Was Hard

+

The technical challenges that made edge deployment complex... until now

+ +
+
+
πŸ”‹
+

Power Constraints

+

From battery-powered phones to energy-harvesting sensors, edge devices have strict power budgets. Microcontrollers may run on milliwatts, requiring extreme efficiency.

+
+ +
+
🌑️
+

Thermal Management

+

Sustained inference generates heat without active cooling. From smartphones to industrial IoT devices, thermal throttling limits continuous AI workloads.

+
+ +
+
πŸ’Ύ
+

Memory Limitations

+

Edge devices range from high-end phones to tiny microcontrollers. Beyond capacity, limited memory bandwidth creates bottlenecks when moving tensors between compute units.

+
+ +
+
πŸ”§
+

Hardware Heterogeneity

+

From microcontrollers to smartphone NPUs to embedded GPUs. Each architecture demands unique optimizations, making broad deployment across diverse form factors extremely challenging.

+
+
+
+
+ + +
+
+

PyTorch Powers 92% of AI Research

+

But deploying PyTorch models to edge devices meant losing everything that made PyTorch great

+ +
+
+

Research & Training

+

PyTorch's intuitive APIs and eager execution power breakthrough research

+
+ +
β†’
+ +
+

The Conversion Nightmare

+

Multiple intermediate formats, custom runtimes, C++ rewrites

+
+
+ +
+

The Hidden Costs of Conversion (Status Quo)

+
+
+
+ ❌ + Lost Semantics +
+

PyTorch operations don't map 1:1 to other formats

+
+
+
+ ❌ + Debugging Nightmare +
+

Can't trace errors back to original PyTorch code

+
+
+
+ ❌ + Vendor-Specific Formats +
+

Locked into proprietary formats with limited operator support

+
+
+
+ ❌ + Language Barriers +
+

Teams spend months rewriting Python models in C++ for production

+
+
+
+ +
+
+ + +
+
+

ExecuTorch
PyTorch's On-Device AI Framework

+ +
+
+
🎯
+

No Conversions

+

Direct export from PyTorch to edge. Core ATen operators preserved. No intermediate formats, no vendor lock-in.

+
+ +
+
βš™οΈ
+

Ahead-of-Time Compilation

+

Optimize models offline for target device capabilities. Hardware-specific performance tuning before deployment.

+
+ +
+
πŸ”§
+

Modular by Design

+

Pick and choose optimization steps. Composable at both compile-time and runtime for maximum flexibility.

+
+ +
+
πŸš€
+

Hardware Ecosystem

+

Fully open source with hardware partner contributions. Built on PyTorch's standardized IR and operator set.

+
+ +
+
πŸ’Ύ
+

Embedded-Friendly Runtime

+

Portable C++ runtime runs on microcontrollers to smartphones.

+
+ +
+
πŸ”—
+

PyTorch Ecosystem

+

Native integration with PyTorch ecosystem, including torchao for quantization. Stay in familiar tools throughout.

+
+
+
+
+ + +
+
+

Simple as 1-2-3

+

Export, optimize, and run PyTorch models on edge devices with just a few lines of code

+ +
+
+ +
+

+ 1. Export Your PyTorch Model +

+
import torch
+from torch.export import export
+
+# Your existing PyTorch model
+model = MyModel().eval()
+example_inputs = (torch.randn(1, 3, 224, 224),)
+
+# Creates semantically equivalent graph representation
+exported_program = export(model, example_inputs)
+
+ + +
+

+ 2. Optimize for Target Hardware +

+

+ Switch between backends with a single line change +

+ +
+
Choose your target hardware to see the corresponding code:
+
+
+
CPU Optimization
+
XNNPACK with KleidiAI
+
+
+
Apple Devices
+
Core ML partitioner
+
+
+
Qualcomm Chips
+
Hexagon NPU support
+
+
+
+ 9 More
+
Vulkan, MediaTek, Samsung...
+
+
+ +
+
+
from executorch.exir import to_edge_transform_and_lower
+from executorch.backends.xnnpack import XnnpackPartitioner
+
+program = to_edge_transform_and_lower(
+    exported_program,
+    partitioner=[XnnpackPartitioner()]
+).to_executorch()
+
+
+ +
+
+
from executorch.exir import to_edge_transform_and_lower
+from executorch.backends.apple import CoreMLPartitioner
+
+program = to_edge_transform_and_lower(
+    exported_program,
+    partitioner=[CoreMLPartitioner()]
+).to_executorch()
+
+
+ +
+
+
from executorch.exir import to_edge_transform_and_lower
+from executorch.backends.qualcomm import QnnPartitioner
+
+program = to_edge_transform_and_lower(
+    exported_program,
+    partitioner=[QnnPartitioner()]
+).to_executorch()
+
+
+
+ +
+
# Save to .pte file
+with open("model.pte", "wb") as f:
+    f.write(program.buffer)
+
+
+ + +
+

+ 3. Run on Any Platform +

+
+
Choose your platform to see the native API:
+
+
+
C++
+
+
+
Swift
+
+
+
Kotlin
+
+
+
Objective-C
+
+
+
WebAssembly
+
+
+ +
+
+
// Load and execute model
+auto module = Module("model.pte");
+auto method = module.load_method("forward");
+auto outputs = method.execute({input_tensor});
+// Access result tensors
+auto result = outputs[0].toTensor();
+
+
+ +
+
+
// Initialize ExecuTorch module
+let module = try ETModule(path: "model.pte")
+// Run inference with tensors
+let outputs = try module.forward([inputTensor])
+// Process results
+let result = outputs[0]
+
+
+ +
+
+
// Load model from assets
+val module = Module.load(assetFilePath("model.pte"))
+// Execute with tensor input
+val outputs = module.forward(inputTensor)
+// Extract prediction results
+val prediction = outputs[0].dataAsFloatArray
+
+
+ +
+
+
// Initialize ExecuTorch module
+ETModule *module = [[ETModule alloc] initWithPath:@"model.pte" error:nil];
+// Run inference with tensors
+NSArray *outputs = [module forwardWithInputs:@[inputTensor] error:nil];
+// Process results
+ETTensor *result = outputs[0];
+
+
+ +
+
+
// Load model from ArrayBuffer
+const module = et.Module.load(buffer);
+// Create input tensor from data
+const inputTensor = et.Tensor.fromIter(tensorData, shape);
+// Run inference
+const output = module.forward(inputTensor);
+
+
+
+

+ Available on Android, iOS, Linux, Windows, macOS, and embedded microcontrollers (e.g., DSP and Cortex-M processors) +

+
+
+ +
+

+ Need advanced features? ExecuTorch supports memory planning, quantization, profiling, and custom compiler passes. +

+ + Try the Full Tutorial β†’ + +
+
+
+
+ + +
+
+

High-Level Multimodal APIs

+

Run complex multimodal LLMs with simplified C++ interfaces

+ +
+
+
+

+ Multimodal Runner - Text + Vision + Audio in One API +

+

+ Choose your platform to see the multimodal API supporting text, images, and audio: +

+ +
+
Unified API across mobile platforms:
+
+
+
C++
+
Cross-platform
+
+
+
Swift
+
iOS native
+
+
+
Kotlin
+
Android native
+
+
+ +
+
+
#include "executorch/extension/llm/runner/multimodal_runner.h"
+
+// Initialize multimodal model (e.g., Voxtral, LLaVA)
+auto runner = MultimodalRunner::create(
+    "model.pte",     // Text model
+    "vision.pte",   // Vision encoder
+    "audio.pte",    // Audio encoder
+    tokenizer_path,
+    temperature
+);
+
+// Run inference with audio + image + text
+auto result = runner->generate_multimodal(
+    "Describe what you hear and see",
+    audio_tensor,   // Audio input
+    image_tensor,   // Image input
+    max_tokens
+);
+
+// Stream response tokens
+for (const auto& token : result.tokens) {
+    std::cout << token << std::flush;
+}
+
+
+ +
+
+
import ExecuTorch
+import AVFoundation
+
+// Initialize multimodal runner with audio support
+let runner = try MultimodalRunner(
+    modelPath: "model.pte",
+    visionPath: "vision.pte",
+    audioPath: "audio.pte",
+    tokenizerPath: tokenizerPath,
+    temperature: 0.7
+)
+
+// Process audio and image inputs
+let audioTensor = AudioProcessor.preprocess(audioURL)
+let imageTensor = ImageProcessor.preprocess(uiImage)
+
+// Generate with audio + vision + text
+let result = try runner.generateMultimodal(
+    prompt: "Describe what you hear and see",
+    audio: audioTensor,
+    image: imageTensor,
+    maxTokens: 512
+)
+
+// Stream tokens to UI
+result.tokens.forEach { token in
+    DispatchQueue.main.async {
+        responseText += token
+    }
+}
+
+
+ +
+
+
import org.pytorch.executorch.MultimodalRunner
+import android.media.MediaRecorder
+
+// Initialize multimodal runner with audio
+val runner = MultimodalRunner.create(
+    modelPath = "model.pte",
+    visionPath = "vision.pte",
+    audioPath = "audio.pte",
+    tokenizerPath = tokenizerPath,
+    temperature = 0.7f
+)
+
+// Process audio and image inputs
+val audioTensor = AudioProcessor.preprocess(audioFile)
+val imageTensor = ImageProcessor.preprocess(bitmap)
+
+// Generate with audio + vision + text
+val result = runner.generateMultimodal(
+    prompt = "Describe what you hear and see",
+    audio = audioTensor,
+    image = imageTensor,
+    maxTokens = 512
+)
+
+// Display streaming response
+result.tokens.forEach { token ->
+    runOnUiThread {
+        responseView.append(token)
+    }
+}
+
+
+
+
+
+ +
+

+ High-level APIs abstract away model complexity - just load, prompt, and get results +

+ + Explore LLM APIs β†’ + +
+
+
+
+ + +
+
+

Universal AI Runtime

+
+
+
+ πŸ’¬ LLMs + πŸ‘οΈ Computer Vision + 🎀 Speech AI + 🎯 Recommendations + 🧠 Multimodal + ⚑ Any PyTorch Model + + πŸ’¬ LLMs + πŸ‘οΈ Computer Vision + 🎀 Speech AI + 🎯 Recommendations + 🧠 Multimodal + ⚑ Any PyTorch Model +
+
+
+
+
+ + +
+
+

Comprehensive Hardware Ecosystem

+

Hardware acceleration contributed by industry partners via open source

+ +
+
+

XNNPACK with KleidiAI

+

CPU acceleration across ARM and x86 architectures

+
+ +
+

Apple Core ML

+

Neural Engine and Apple Silicon optimization

+
+ +
+

Qualcomm Snapdragon

+

Hexagon NPU support

+
+ +
+

ARM Ethos-U

+

Microcontroller NPU for ultra-low power

+
+ +
+

Vulkan GPU

+

Cross-platform graphics acceleration

+
+ +
+

Intel OpenVINO

+

x86 CPU and integrated GPU optimization

+
+ +
+

MediaTek NPU

+

Dimensity chipset acceleration

+
+ +
+

Samsung Exynos

+

Integrated NPU optimization

+
+ +
+

NXP Neutron

+

Automotive and IoT acceleration

+
+ +
+

Apple MPS

+

Metal Performance Shaders for GPU acceleration

+
+ +
+

ARM VGF

+

Versatile graphics framework support

+
+ +
+

Cadence DSP

+

Digital signal processor optimization

+
+
+
+
+ + + +
+
+

Success Stories

+

Production deployments and strategic partnerships accelerating edge AI

+ +
+

Adoption

+ +
+ +
+

Ecosystem Integration

+
    +
  • Hugging Face: Optimum-ExecuTorch for direct transformer model deployment
  • +
  • LiquidAI: Next-generation Liquid Foundation Models optimized for edge deployment
  • +
  • Software Mansion: React Native ExecuTorch bringing edge AI to mobile apps
  • +
+
+ +
+

Demos

+
    +
  • Llama: Complete LLM implementation with quantization, KV caching, and mobile deployment
  • +
  • Voxtral: Multimodal AI combining text, vision, and audio processing in one model
  • +
+
+ +
+ + β†’ View all success stories + +
+
+
+ + +
+
+

Ready to Deploy AI at the Edge?

+

Join thousands of developers using ExecuTorch in production

+ Get Started Today +
+
+ + + + + + +