|
| 1 | +# ExecuTorch Optimum Module |
| 2 | + |
| 3 | +This module provides integration utilities for exporting and optimizing transformer models for ExecuTorch runtime. It contains specialized wrapper classes and utilities to make pre-trained models from Hugging Face Transformers compatible with `torch.export` and ExecuTorch execution. A lot of code is forked from `optimum-executorch` and adopted from `transformers`. We put it in ExecuTorch so that we can fast iterate on the stack. Eventually we want to upstream changes to `transformers` and `optimum-executorch`. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +The optimum module bridges the gap between Hugging Face Transformers models and ExecuTorch by providing: |
| 8 | + |
| 9 | +- Exportable wrapper modules for different model types |
| 10 | +- Custom cache implementations for efficient inference |
| 11 | +- Utilities for model configuration and optimization |
| 12 | +- Integration with ExecuTorch's custom operators |
| 13 | + |
| 14 | +## Key Components |
| 15 | + |
| 16 | +### Exportable Modules |
| 17 | + |
| 18 | +#### `TorchExportableModuleWithHybridCache` |
| 19 | +A wrapper module that makes decoder-only language models exportable with `torch.export` using `HybridCache`. This is a forked version of [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L391) with some modifications to support `inputs_embeds`. |
| 20 | + |
| 21 | +**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly. |
| 22 | + |
| 23 | +#### `TorchExportableModuleForImageTextLM` |
| 24 | +A wrapper for text decoder model in a vision-language model. It is very similar to [`TorchExportableModuleForDecoderOnlyLM`](https://github.com/huggingface/transformers/blob/main/src/transformers/integrations/executorch.py#L30) but instead of taking `input_ids` this module takes `inputs_embeds`. This is because we want to be able to take both token embeddings and image embeddings as inputs. |
| 25 | + |
| 26 | +**Note**: This class should be upstreamed to transformers. We keep it here so that we can iterate quickly. |
| 27 | + |
| 28 | +#### `ImageEncoderExportableModule` |
| 29 | +A wrapper for vision encoder models that projects vision features to language model space. Commonly implemented as `get_image_features()` in HuggingFace transformers. For example: [`Gemma3Model.get_image_features()`](https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma3/modeling_gemma3.py#L794). |
| 30 | + |
| 31 | +#### `ImageTextToTextExportableModule` |
| 32 | +A wrapper of `torch.nn.Module` for `image-text-to-text` task. Provides `export()` API that generates an `ExportedProgram`. It will be consumed by `xnnpack.py` recipe to generate ExecuTorch program. |
| 33 | + |
| 34 | +### Custom Implementations |
| 35 | +These are mostly copied from `optimum-executorch`. We put them here so that they can be reused by `integrations.py` and `xnnpack.py` recipe. |
| 36 | + |
| 37 | +- **Custom KV Cache**: Optimized key-value cache implementations for ExecuTorch |
| 38 | +- **Custom SDPA**: Scaled Dot-Product Attention optimizations |
| 39 | +- **XNNPACK Integration**: Lower to XNNPACK backend for optimized inference on CPU |
| 40 | + |
| 41 | +### Utilities |
| 42 | + |
| 43 | +- Configuration saving and constant method generation |
| 44 | +- Model metadata extraction |
| 45 | +- Export helper functions |
| 46 | + |
| 47 | +## Usage |
| 48 | + |
| 49 | +```python |
| 50 | +from transformers import PretrainedConfig |
| 51 | +from executorch.extension.llm.optimum.image_text_to_text import load_image_text_to_text_model |
| 52 | +from executorch.extension.llm.optimum.xnnpack import export_to_executorch_with_xnnpack |
| 53 | +from executorch.extension.llm.optimum.modeling import ExecuTorchModelForImageTextToTextCausalLM |
| 54 | + |
| 55 | +model_id = "google/gemma-3-4b-it" |
| 56 | + |
| 57 | +module = load_image_text_to_text_model( |
| 58 | + model_id, |
| 59 | + use_custom_sdpa=True, |
| 60 | + use_custom_kv_cache=True, |
| 61 | + qlinear=True, |
| 62 | + qembedding=True, |
| 63 | +) |
| 64 | +model = export_to_executorch_with_xnnpack(module) |
| 65 | +et_model = ExecuTorchModelForImageTextToTextCausalLM(model, PretrainedConfig.from_pretrained(model_id)) |
| 66 | +``` |
| 67 | + |
| 68 | +## Testing |
| 69 | + |
| 70 | +Run tests with: |
| 71 | +```bash |
| 72 | +python -m pytest extension/llm/optimum/test/ |
| 73 | +``` |
0 commit comments