Skip to content

Commit b0a0276

Browse files
committed
Pr review
1 parent 55a4669 commit b0a0276

File tree

1 file changed

+26
-3
lines changed

1 file changed

+26
-3
lines changed

docs/source/llm/export-llm-optimum.md

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# Exporting LLMs with Optimum ExecuTorch
1+
# Exporting LLMs with HuggingFace's Optimum ExecuTorch
22

33
[Optimum ExecuTorch](https://github.com/huggingface/optimum-executorch) provides a streamlined way to export Hugging Face transformer models to ExecuTorch format. It offers seamless integration with the Hugging Face ecosystem, making it easy to export models directly from the Hugging Face Hub.
44

55
## Overview
66

7-
Optimum ExecuTorch supports a much wider variety of model architectures compared to ExecuTorch's native `export_llm` API. While `export_llm` focuses on a limited set of highly optimized models (Llama, Qwen, Phi, and SmolLM) with advanced features like SpinQuant and attention sink, Optimum ExecuTorch can export diverse architectures including Gemma, Mistral, GPT-2, BERT, T5, Whisper, and many others.
7+
Optimum ExecuTorch supports a much wider variety of model architectures compared to ExecuTorch's native `export_llm` API. While `export_llm` focuses on a limited set of highly optimized models (Llama, Qwen, Phi, and SmolLM) with advanced features like SpinQuant and attention sink, Optimum ExecuTorch can export diverse architectures including Gemma, Mistral, GPT-2, BERT, T5, Whisper, Voxtral, and many others.
88

99
### Use Optimum ExecuTorch when:
1010
- You need to export models beyond the limited set supported by `export_llm`
@@ -130,7 +130,30 @@ For detailed examples of exporting each model type, see the [Optimum ExecuTorch
130130

131131
## Running Exported Models
132132

133-
After exporting your model to a `.pte` file, you can run it on device:
133+
### Verifying Output with Python
134+
135+
After exporting, you can verify the model output in Python before deploying to device using classes from `modeling.py`, such as the `ExecuTorchModelForCausalLM` class for LLMs:
136+
137+
```python
138+
from optimum.executorch import ExecuTorchModelForCausalLM
139+
from transformers import AutoTokenizer
140+
141+
# Load the exported model
142+
model = ExecuTorchModelForCausalLM.from_pretrained("./smollm2_exported")
143+
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-135M-Instruct")
144+
145+
# Generate text
146+
generated_text = model.text_generation(
147+
tokenizer=tokenizer,
148+
prompt="Once upon a time",
149+
max_seq_len=128,
150+
)
151+
print(generated_text)
152+
```
153+
154+
### Running on Device
155+
156+
After verifying your model works correctly, deploy it to device:
134157

135158
- [Running with C++](run-with-c-plus-plus.md) - Run exported models using ExecuTorch's C++ runtime
136159
- [Running on Android](https://github.com/meta-pytorch/executorch-examples/tree/main/llm/android) - Deploy to Android devices

0 commit comments

Comments
 (0)