@@ -14,25 +14,15 @@ specific language governing permissions and limitations under the License.
1414To export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :
1515
1616```bash
17- optimum-cli export openvino --model gpt2 ov_model/
17+ optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_model/
1818```
1919
2020The model argument can either be the model ID of a model hosted on the [Hub](https://huggingface.co/models) or a path to a model hosted locally. For local models, you need to specify the task for which the model should be loaded before export, among the list of the [supported tasks](https://huggingface.co/docs/optimum/main/en/exporters/task_manager).
2121
22-
2322```bash
24- optimum-cli export openvino --model local_model_dir --task text-generation-with-past ov_model/
23+ optimum-cli export openvino --model local_llama --task text-generation-with-past ov_model/
2524```
2625
27- The `-with-past` suffix enable the re-use of past keys and values. This allows to avoid recomputing the same intermediate activations during the generation. to export the model without, you will need to remove this suffix.
28-
29- | With K-V cache | Without K-V cache |
30- |------------------------------------------|--------------------------------------|
31- | `text-generation-with-past` | `text-generation` |
32- | `text2text-generation-with-past` | `text2text-generation` |
33- | `automatic-speech-recognition-with-past` | `automatic-speech-recognition` |
34-
35-
3626Check out the help for more options:
3727
3828```bash
@@ -97,7 +87,7 @@ Optional arguments:
9787You can also apply fp16, 8-bit or 4-bit weight-only quantization on the Linear, Convolutional and Embedding layers when exporting your model by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
9888
9989```bash
100- optimum-cli export openvino --model gpt2 --weight-format int8 ov_model/
90+ optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_model/
10191```
10292
10393For more information on the quantization parameters checkout the [documentation](inference#weight-only-quantization)
@@ -109,6 +99,33 @@ Models larger than 1 billion parameters are exported to the OpenVINO format with
10999
110100</Tip >
111101
102+
103+ ### Decoder models
104+
105+ For models with a decoder, we enable the re-use of past keys and values by default. This allows to avoid recomputing the same intermediate activations at each generation step. To export the model without, you will need to remove the ` -with-past ` suffix when specifying the task.
106+
107+ | With K-V cache | Without K-V cache |
108+ | ------------------------------------------| --------------------------------------|
109+ | ` text-generation-with-past ` | ` text-generation ` |
110+ | ` text2text-generation-with-past ` | ` text2text-generation ` |
111+ | ` automatic-speech-recognition-with-past ` | ` automatic-speech-recognition ` |
112+
113+
114+ ### Diffusion models
115+
116+ When Stable Diffusion models are exported to the OpenVINO format, they are decomposed into different components that are later combined during inference:
117+
118+ * Text encoder(s)
119+ * U-Net
120+ * VAE encoder
121+ * VAE decoder
122+
123+ To export your Stable Diffusion XL model to the OpenVINO IR format with the CLI you can do as follows:
124+
125+ ``` bash
126+ optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 ov_sdxl/
127+ ```
128+
112129## When loading your model
113130
114131You can also load your PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, by setting ` export=True ` when loading your model.
@@ -121,7 +138,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
121138+ from optimum.intel import OVModelForCausalLM
122139 from transformers import AutoTokenizer
123140
124- model_id = "gpt2 "
141+ model_id = "meta-llama/Meta-Llama-3-8B "
125142- model = AutoModelForCausalLM.from_pretrained(model_id)
126143+ model = OVModelForCausalLM.from_pretrained(model_id, export=True)
127144 tokenizer = AutoTokenizer.from_pretrained(model_id)
@@ -137,7 +154,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
137154from transformers import AutoModelForCausalLM
138155from optimum.exporters.openvino import export_from_model
139156
140- model = AutoModelForCausalLM.from_pretrained(" gpt2 " )
157+ model = AutoModelForCausalLM.from_pretrained(" meta-llama/Meta-Llama-3-8B " )
141158export_from_model(model, output = " ov_model" , task = " text-generation-with-past" )
142159```
143160
0 commit comments