Skip to content

Commit 01b069f

Browse files
Refactor openvino inference section (#794)
* refactor inference * update code snippet * add references * add table * add diffusers install * documentation * remove task * Update docs/source/openvino/inference.mdx Co-authored-by: Helena Kloosterman <[email protected]> * Update docs/source/openvino/inference.mdx Co-authored-by: Helena Kloosterman <[email protected]> * Update docs/source/openvino/tutorials/diffusers.mdx Co-authored-by: Helena Kloosterman <[email protected]> * Update docs/source/openvino/inference.mdx Co-authored-by: Helena Kloosterman <[email protected]> * update export doc * format --------- Co-authored-by: Helena Kloosterman <[email protected]>
1 parent eac1f6c commit 01b069f

File tree

8 files changed

+449
-351
lines changed

8 files changed

+449
-351
lines changed

docs/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ RUN npm install [email protected] -g && \
2525
RUN python3 -m pip install --no-cache-dir --upgrade pip
2626
RUN python3 -m pip install --no-cache-dir git+https://github.com/huggingface/doc-builder.git
2727
RUN git clone $clone_url && cd optimum-intel && git checkout $commit_sha
28-
RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,nncf,quality]
28+
RUN python3 -m pip install --no-cache-dir ./optimum-intel[neural-compressor,openvino,diffusers,quality]

docs/source/_toctree.yml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,13 @@
2222
title: Supported Models
2323
- local: openvino/reference
2424
title: Reference
25+
- sections:
26+
- local: openvino/tutorials/notebooks
27+
title: Notebooks
28+
- local: openvino/tutorials/diffusers
29+
title: Generate images with Diffusion models
30+
title: Tutorials
31+
isExpanded: false
2532
title: OpenVINO
2633
title: Optimum Intel
2734
isExpanded: false

docs/source/openvino/export.mdx

Lines changed: 32 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,25 +14,15 @@ specific language governing permissions and limitations under the License.
1414
To export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :
1515

1616
```bash
17-
optimum-cli export openvino --model gpt2 ov_model/
17+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_model/
1818
```
1919

2020
The model argument can either be the model ID of a model hosted on the [Hub](https://huggingface.co/models) or a path to a model hosted locally. For local models, you need to specify the task for which the model should be loaded before export, among the list of the [supported tasks](https://huggingface.co/docs/optimum/main/en/exporters/task_manager).
2121

22-
2322
```bash
24-
optimum-cli export openvino --model local_model_dir --task text-generation-with-past ov_model/
23+
optimum-cli export openvino --model local_llama --task text-generation-with-past ov_model/
2524
```
2625

27-
The `-with-past` suffix enable the re-use of past keys and values. This allows to avoid recomputing the same intermediate activations during the generation. to export the model without, you will need to remove this suffix.
28-
29-
| With K-V cache | Without K-V cache |
30-
|------------------------------------------|--------------------------------------|
31-
| `text-generation-with-past` | `text-generation` |
32-
| `text2text-generation-with-past` | `text2text-generation` |
33-
| `automatic-speech-recognition-with-past` | `automatic-speech-recognition` |
34-
35-
3626
Check out the help for more options:
3727

3828
```bash
@@ -97,7 +87,7 @@ Optional arguments:
9787
You can also apply fp16, 8-bit or 4-bit weight-only quantization on the Linear, Convolutional and Embedding layers when exporting your model by setting `--weight-format` to respectively `fp16`, `int8` or `int4`:
9888

9989
```bash
100-
optimum-cli export openvino --model gpt2 --weight-format int8 ov_model/
90+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_model/
10191
```
10292

10393
For more information on the quantization parameters checkout the [documentation](inference#weight-only-quantization)
@@ -109,6 +99,33 @@ Models larger than 1 billion parameters are exported to the OpenVINO format with
10999

110100
</Tip>
111101

102+
103+
### Decoder models
104+
105+
For models with a decoder, we enable the re-use of past keys and values by default. This allows to avoid recomputing the same intermediate activations at each generation step. To export the model without, you will need to remove the `-with-past` suffix when specifying the task.
106+
107+
| With K-V cache | Without K-V cache |
108+
|------------------------------------------|--------------------------------------|
109+
| `text-generation-with-past` | `text-generation` |
110+
| `text2text-generation-with-past` | `text2text-generation` |
111+
| `automatic-speech-recognition-with-past` | `automatic-speech-recognition` |
112+
113+
114+
### Diffusion models
115+
116+
When Stable Diffusion models are exported to the OpenVINO format, they are decomposed into different components that are later combined during inference:
117+
118+
* Text encoder(s)
119+
* U-Net
120+
* VAE encoder
121+
* VAE decoder
122+
123+
To export your Stable Diffusion XL model to the OpenVINO IR format with the CLI you can do as follows:
124+
125+
```bash
126+
optimum-cli export openvino --model stabilityai/stable-diffusion-xl-base-1.0 ov_sdxl/
127+
```
128+
112129
## When loading your model
113130

114131
You can also load your PyTorch checkpoint and convert it to the OpenVINO format on-the-fly, by setting `export=True` when loading your model.
@@ -121,7 +138,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
121138
+ from optimum.intel import OVModelForCausalLM
122139
from transformers import AutoTokenizer
123140

124-
model_id = "gpt2"
141+
model_id = "meta-llama/Meta-Llama-3-8B"
125142
- model = AutoModelForCausalLM.from_pretrained(model_id)
126143
+ model = OVModelForCausalLM.from_pretrained(model_id, export=True)
127144
tokenizer = AutoTokenizer.from_pretrained(model_id)
@@ -137,7 +154,7 @@ To easily save the resulting model, you can use the `save_pretrained()` method,
137154
from transformers import AutoModelForCausalLM
138155
from optimum.exporters.openvino import export_from_model
139156

140-
model = AutoModelForCausalLM.from_pretrained("gpt2")
157+
model = AutoModelForCausalLM.from_pretrained("meta-llama/Meta-Llama-3-8B")
141158
export_from_model(model, output="ov_model", task="text-generation-with-past")
142159
```
143160

0 commit comments

Comments
 (0)