Skip to content

Commit 3972b97

Browse files
committed
Enable transformers multimodal export flow in extension/llm
See README.md for why we are doing this instead of inside `optimum-executorch`. Basically we need to change code in `transformers` and also touching `xnnpack.py` and `custom_sdpa.py` in `optimum-executorch`, these changes should be upstreamed to these repos respectively. For now let's put it in ET so that we can quickly iterate on it.
1 parent bedce91 commit 3972b97

File tree

14 files changed

+1696
-1
lines changed

14 files changed

+1696
-1
lines changed

.github/workflows/_unittest.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,18 +31,22 @@ jobs:
3131
id-token: write
3232
contents: read
3333
with:
34+
secrets-env: EXECUTORCH_HF_TOKEN
3435
runner: linux.2xlarge
3536
docker-image: ${{ inputs.docker-image }}
3637
submodules: 'recursive'
3738
ref: ${{ github.event_name == 'pull_request' && github.event.pull_request.head.sha || github.sha }}
3839
timeout: 90
3940
script: |
4041
set -eux
42+
pip install -U "huggingface_hub[cli]"
43+
huggingface-cli login --token $SECRET_EXECUTORCH_HF_TOKEN
4144
.ci/scripts/unittest-linux.sh --build-tool "${{ inputs.build-tool }}" --build-mode "${{ inputs.build-mode }}" --editable "${{ inputs.editable }}"
4245
4346
macos:
4447
uses: pytorch/test-infra/.github/workflows/macos_job.yml@main
4548
with:
49+
secrets-env: EXECUTORCH_HF_TOKEN
4650
runner: macos-m1-stable
4751
python-version: '3.11'
4852
submodules: 'recursive'
@@ -51,4 +55,6 @@ jobs:
5155
set -eux
5256
# This is needed to get the prebuilt PyTorch wheel from S3
5357
${CONDA_RUN} --no-capture-output pip install awscli==1.37.21
58+
${CONDA_RUN} pip install -U "huggingface_hub[cli]"
59+
huggingface-cli login --token $SECRET_EXECUTORCH_HF_TOKEN
5460
.ci/scripts/unittest-macos.sh --build-tool "${{ inputs.build-tool }}" --build-mode "${{ inputs.build-mode }}" --editable "${{ inputs.editable }}"

extension/llm/optimum/README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# ExecuTorch Optimum Module
2+
3+
This module provides integration utilities for exporting and optimizing transformer models for ExecuTorch runtime. It contains specialized wrapper classes and utilities to make pre-trained models from Hugging Face Transformers compatible with `torch.export` and ExecuTorch execution. A lot of code is forked from `optimum-executorch` and adopted from `transformers`. We put it in ExecuTorch so that we can fast iterate on the stack. Eventually we want to upstream changes to `transformers` and `optimum-executorch`.
4+
5+
## Overview
6+
7+
The optimum module bridges the gap between Hugging Face Transformers models and ExecuTorch by providing:
8+
9+
- Exportable wrapper modules for different model types
10+
- Custom cache implementations for efficient inference
11+
- Utilities for model configuration and optimization
12+
- Integration with ExecuTorch's custom operators
13+
14+
## Key Components
15+
16+
### Exportable Modules
17+
18+
#### `TorchExportableModuleWithHybridCache`
19+
A wrapper module that makes decoder-only language models exportable with `torch.export` using `HybridCache`. This is a forked version of [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L391) with some modifications to support `inputs_embeds`.
20+
21+
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
22+
23+
#### `TorchExportableModuleForImageTextLM`
24+
A wrapper for text decoder model in a vision-language model. It is very similar to [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L30) but instead of taking `input_ids` this module takes `inputs_embeds`. This is because we want to be able to take both token embeddings and image embeddings as inputs.
25+
26+
**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly.
27+
28+
#### `ImageEncoderExportableModule`
29+
A wrapper for vision encoder models that projects vision features to language model space. Commonly implemented as `get_image_features()` in HuggingFace transformers. For example: [`Gemma3Model.get_image_features()`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L794).
30+
31+
#### `ImageTextToTextExportableModule`
32+
A wrapper of `torch.nn.Module` for `image-text-to-text` task. Provides `export()` API that generates an `ExportedProgram`. It will be consumed by `xnnpack.py` recipe to generate ExecuTorch program.
33+
34+
### Custom Implementations
35+
These are mostly copied from `optimum-executorch`. We put them here so that they can be reused by `integrations.py` and `xnnpack.py` recipe.
36+
37+
- **Custom KV Cache**: Optimized key-value cache implementations for ExecuTorch
38+
- **Custom SDPA**: Scaled Dot-Product Attention optimizations
39+
- **XNNPACK Integration**: Lower to XNNPACK backend for optimized inference on CPU
40+
41+
### Utilities
42+
43+
- Configuration saving and constant method generation
44+
- Model metadata extraction
45+
- Export helper functions
46+
47+
## Usage
48+
49+
```python
50+
from transformers import PretrainedConfig
51+
from executorch.extension.llm.optimum.image_text_to_text import load_image_text_to_text_model
52+
from executorch.extension.llm.optimum.xnnpack import export_to_executorch_with_xnnpack
53+
from executorch.extension.llm.optimum.modeling import ExecuTorchModelForImageTextToTextCausalLM
54+
55+
model_id = "google/gemma-3-4b-it"
56+
57+
module = load_image_text_to_text_model(
58+
model_id,
59+
use_custom_sdpa=True,
60+
use_custom_kv_cache=True,
61+
qlinear=True,
62+
qembedding=True,
63+
)
64+
model = export_to_executorch_with_xnnpack(module)
65+
et_model = ExecuTorchModelForImageTextToTextCausalLM(model, PretrainedConfig.from_pretrained(model_id))
66+
```
67+
68+
## Testing
69+
70+
Run tests with:
71+
```bash
72+
python -m pytest extension/llm/optimum/test/
73+
```

extension/llm/optimum/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)