Skip to content

Commit 02e746b

Browse files
committed
Enable transformers multimodal export flow in extension/llm
- Add optimum integration for multimodal models - Implement custom KV cache and SDPA operations - Add image-text-to-text support - Update dependencies and CI workflow
1 parent 5d3550f commit 02e746b

File tree

13 files changed

+1691
-1
lines changed

13 files changed

+1691
-1
lines changed

.github/workflows/trunk.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -797,6 +797,8 @@ jobs:
797797
--etdump_path ${OUTPUT_DIR}/etdump.etdp \
798798
--tsv_path ${TSV_PATH}
799799
800+
echo "::group::Run Multimodal Tests"
801+
python3 -m unittest extension/llm/optimum/test/test_modeling_gemma3.py
800802
echo "::endgroup::"
801803
802804

extension/llm/optimum/README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# ExecuTorch Optimum Module
2+
3+
This module provides integration utilities for exporting and optimizing transformer models for ExecuTorch runtime. It contains specialized wrapper classes and utilities to make pre-trained models from Hugging Face Transformers compatible with `torch.export` and ExecuTorch execution. A lot of code is forked from `optimum-executorch` and adopted from `transformers`. We put it in ExecuTorch so that we can fast iterate on the stack. Eventually we want to upstream changes to `transformers` and `optimum-executorch`.
4+
5+
## Overview
6+
7+
The optimum module bridges the gap between Hugging Face Transformers models and ExecuTorch by providing:
8+
9+
- Exportable wrapper modules for different model types
10+
- Custom cache implementations for efficient inference
11+
- Utilities for model configuration and optimization
12+
- Integration with ExecuTorch's custom operators
13+
14+
## Key Components
15+
16+
### Exportable Modules
17+
18+
#### `TorchExportableModuleWithHybridCache`
19+
A wrapper module that makes decoder-only language models exportable with `torch.export` using `HybridCache`. This is a forked version of [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L391) with some modifications to support `inputs_embeds`.
20+
21+
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
22+
23+
#### `TorchExportableModuleForImageTextLM`
24+
A wrapper for text decoder model in a vision-language model. It is very similar to [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L30) but instead of taking `input_ids` this module takes `inputs_embeds`. This is because we want to be able to take both token embeddings and image embeddings as inputs.
25+
26+
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
27+
28+
#### `ImageEncoderExportableModule`
29+
A wrapper for vision encoder models that projects vision features to language model space. Commonly implemented as `get_image_features()` in HuggingFace transformers. For example: [`Gemma3Model.get_image_features()`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L794).
30+
31+
#### `ImageTextToTextExportableModule`
32+
A wrapper of `torch.nn.Module` for `image-text-to-text` task. Provides `export()` API that generates an `ExportedProgram`. It will be consumed by `xnnpack.py` recipe to generate ExecuTorch program.
33+
34+
### Custom Implementations
35+
These are mostly copied from `optimum-executorch`. We put them here so that they can be reused by `integrations.py` and `xnnpack.py` recipe.
36+
37+
- **Custom KV Cache**: Optimized key-value cache implementations for ExecuTorch
38+
- **Custom SDPA**: Scaled Dot-Product Attention optimizations
39+
- **XNNPACK Integration**: Lower to XNNPACK backend for optimized inference on CPU
40+
41+
### Utilities
42+
43+
- Configuration saving and constant method generation
44+
- Model metadata extraction
45+
- Export helper functions
46+
47+
## Usage
48+
49+
```python
50+
from transformers import PretrainedConfig
51+
from executorch.extension.llm.optimum.image_text_to_text import load_image_text_to_text_model
52+
from executorch.extension.llm.optimum.xnnpack import export_to_executorch_with_xnnpack
53+
from executorch.extension.llm.optimum.modeling import ExecuTorchModelForImageTextToTextCausalLM
54+
55+
model_id = "google/gemma-3-4b-it"
56+
57+
module = load_image_text_to_text_model(
58+
model_id,
59+
use_custom_sdpa=True,
60+
use_custom_kv_cache=True,
61+
qlinear=True,
62+
qembedding=True,
63+
)
64+
model = export_to_executorch_with_xnnpack(module)
65+
et_model = ExecuTorchModelForImageTextToTextCausalLM(model, PretrainedConfig.from_pretrained(model_id))
66+
```
67+
68+
## Testing
69+
70+
Run tests with:
71+
```bash
72+
python -m pytest extension/llm/optimum/test/
73+
```

extension/llm/optimum/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)