Skip to content

Commit 4fed585

Browse files
committed
Streamline ExecuTorch optimum module to focus on runtime-specific optimizations
This commit implements the recommended architectural approach by removing duplicated functionality and keeping only ExecuTorch-specific components. **Removed (now handled by transformers + optimum-executorch):** - integrations.py - Exportable modules now in transformers.integrations.executorch - image_text_to_text.py - Task handled by optimum-executorch recipe system - modeling.py - General model wrappers not ExecuTorch-specific **Kept (ExecuTorch runtime-specific):** - custom_kv_cache.py - ETCustomStaticCache and ETCustomHybridCache - custom_sdpa.py - Custom SDPA for ExecuTorch operators - xnnpack.py - XNNPACK backend integration and optimization passes - utils.py - ExecuTorch-specific configuration utilities **Benefits:** - No code duplication between repositories - Leverages mature optimum-executorch infrastructure - Focuses on true ExecuTorch runtime optimizations - Maintains unified user experience through optimum-executorch CLI/API **Usage:** Users should now use optimum-executorch for model export: ```bash optimum-cli export executorch --model google/gemma-3-4b-it --task image-text-to-text --recipe xnnpack ``` This module provides ExecuTorch-specific XNNPACK optimizations that can be applied to models exported through optimum-executorch.
1 parent 02e746b commit 4fed585

File tree

6 files changed

+80
-870
lines changed

6 files changed

+80
-870
lines changed

extension/llm/optimum/README.md

Lines changed: 66 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,96 @@
11
# ExecuTorch Optimum Module
22

3-
This module provides integration utilities for exporting and optimizing transformer models for ExecuTorch runtime. It contains specialized wrapper classes and utilities to make pre-trained models from Hugging Face Transformers compatible with `torch.export` and ExecuTorch execution. A lot of code is forked from `optimum-executorch` and adopted from `transformers`. We put it in ExecuTorch so that we can fast iterate on the stack. Eventually we want to upstream changes to `transformers` and `optimum-executorch`.
3+
This module provides ExecuTorch-specific optimizations and integrations for transformer models. It focuses on runtime-specific features that are not available in the upstream transformers or optimum-executorch libraries.
44

55
## Overview
66

7-
The optimum module bridges the gap between Hugging Face Transformers models and ExecuTorch by providing:
7+
This streamlined module contains only ExecuTorch-specific components:
88

9-
- Exportable wrapper modules for different model types
10-
- Custom cache implementations for efficient inference
11-
- Utilities for model configuration and optimization
12-
- Integration with ExecuTorch's custom operators
9+
- Custom cache implementations optimized for ExecuTorch runtime
10+
- Custom SDPA implementations for ExecuTorch operators
11+
- XNNPACK backend integration and optimization passes
12+
- ExecuTorch-specific utilities
13+
14+
For general model export functionality, use `optimum-executorch` which provides a comprehensive recipe system and CLI interface.
1315

1416
## Key Components
1517

16-
### Exportable Modules
18+
### Custom Cache Implementations
1719

18-
#### `TorchExportableModuleWithHybridCache`
19-
A wrapper module that makes decoder-only language models exportable with `torch.export` using `HybridCache`. This is a forked version of [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L391) with some modifications to support `inputs_embeds`.
20+
#### `ETCustomStaticCache` and `ETCustomHybridCache`
21+
Custom KV cache implementations that inherit from Hugging Face's caches but use ExecuTorch's `CustomKVCache` and `CustomRingKVCache` for optimal runtime performance.
2022

21-
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
23+
### Custom SDPA
2224

23-
#### `TorchExportableModuleForImageTextLM`
24-
A wrapper for text decoder model in a vision-language model. It is very similar to [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L30) but instead of taking `input_ids` this module takes `inputs_embeds`. This is because we want to be able to take both token embeddings and image embeddings as inputs.
25+
#### `get_custom_sdpa_for_ring_kv_cache`
26+
Custom Scaled Dot-Product Attention implementation optimized for ExecuTorch's ring buffer caches and sliding window attention.
2527

26-
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
28+
### XNNPACK Integration
2729

28-
#### `ImageEncoderExportableModule`
29-
A wrapper for vision encoder models that projects vision features to language model space. Commonly implemented as `get_image_features()` in HuggingFace transformers. For example: [`Gemma3Model.get_image_features()`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L794).
30+
#### `export_to_executorch_with_xnnpack`
31+
ExecuTorch-specific XNNPACK backend integration with custom optimization passes:
32+
- `RemovePaddingIdxEmbeddingPass`: Removes padding_idx from embedding operations
33+
- Memory planning and quantization optimizations
34+
- Backend delegation analysis and debugging
3035

31-
#### `ImageTextToTextExportableModule`
32-
A wrapper of `torch.nn.Module` for `image-text-to-text` task. Provides `export()` API that generates an `ExportedProgram`. It will be consumed by `xnnpack.py` recipe to generate ExecuTorch program.
36+
### Utilities
3337

34-
### Custom Implementations
35-
These are mostly copied from `optimum-executorch`. We put them here so that they can be reused by `integrations.py` and `xnnpack.py` recipe.
38+
- `save_config_to_constant_methods`: ExecuTorch-specific configuration utilities
39+
- Model metadata extraction for runtime optimization
3640

37-
- **Custom KV Cache**: Optimized key-value cache implementations for ExecuTorch
38-
- **Custom SDPA**: Scaled Dot-Product Attention optimizations
39-
- **XNNPACK Integration**: Lower to XNNPACK backend for optimized inference on CPU
41+
## Usage
4042

41-
### Utilities
43+
For multimodal model export, use optimum-executorch:
4244

43-
- Configuration saving and constant method generation
44-
- Model metadata extraction
45-
- Export helper functions
45+
```bash
46+
# Export with optimum-executorch CLI
47+
optimum-cli export executorch \
48+
--model google/gemma-3-4b-it \
49+
--task image-text-to-text \
50+
--recipe xnnpack \
51+
--use_custom_sdpa \
52+
--use_custom_kv_cache
53+
```
4654

47-
## Usage
55+
```python
56+
# Or via Python API
57+
from optimum.executorch import ExecuTorchModelForCausalLM
58+
59+
model = ExecuTorchModelForCausalLM.from_pretrained(
60+
"google/gemma-3-4b-it",
61+
task="image-text-to-text",
62+
recipe="xnnpack",
63+
use_custom_sdpa=True,
64+
use_custom_kv_cache=True
65+
)
66+
```
67+
68+
For ExecuTorch-specific XNNPACK optimizations:
4869

4970
```python
50-
from transformers import PretrainedConfig
51-
from executorch.extension.llm.optimum.image_text_to_text import load_image_text_to_text_model
71+
from optimum.exporters.executorch.integrations import ImageTextToTextExportableModule
5272
from executorch.extension.llm.optimum.xnnpack import export_to_executorch_with_xnnpack
53-
from executorch.extension.llm.optimum.modeling import ExecuTorchModelForImageTextToTextCausalLM
5473

55-
model_id = "google/gemma-3-4b-it"
74+
# Load model using optimum-executorch
75+
module = ImageTextToTextExportableModule(model, use_custom_kv_cache=True, use_custom_sdpa=True)
5676

57-
module = load_image_text_to_text_model(
58-
model_id,
59-
use_custom_sdpa=True,
60-
use_custom_kv_cache=True,
61-
qlinear=True,
62-
qembedding=True,
63-
)
64-
model = export_to_executorch_with_xnnpack(module)
65-
et_model = ExecuTorchModelForImageTextToTextCausalLM(model, PretrainedConfig.from_pretrained(model_id))
77+
# Apply ExecuTorch-specific XNNPACK optimizations
78+
executorch_program = export_to_executorch_with_xnnpack(module)
6679
```
6780

81+
## Architecture
82+
83+
This module follows the recommended approach:
84+
1. **General export functionality**: Use `optimum-executorch`
85+
2. **Multimodal support**: Enhanced `transformers.integrations.executorch`
86+
3. **ExecuTorch-specific optimizations**: This module
87+
88+
This separation ensures:
89+
- No code duplication between repositories
90+
- Leverages mature optimum-executorch infrastructure
91+
- Focuses ExecuTorch module on runtime-specific optimizations
92+
- Maintains unified user experience through optimum-executorch CLI/API
93+
6894
## Testing
6995

7096
Run tests with:

extension/llm/optimum/image_text_to_text.py

Lines changed: 0 additions & 142 deletions
This file was deleted.

0 commit comments

Comments
 (0)