Skip to content

Commit 856c429

Browse files
echarlaixmvafin
authored andcommitted
Clean README code snippet (#1396)
* clean README * fix doc
1 parent 92f51a1 commit 856c429

File tree

2 files changed

+16
-55
lines changed

2 files changed

+16
-55
lines changed

README.md

Lines changed: 16 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -40,31 +40,22 @@ or to install from source including dependencies:
4040
python -m pip install "optimum-intel[extras]"@git+https://github.com/huggingface/optimum-intel.git
4141
```
4242

43-
where `extras` can be one or more of `ipex`, `neural-compressor`, `openvino`, `nncf`.
43+
where `extras` can be one or more of `ipex`, `neural-compressor`, `openvino`.
4444

4545
# Quick tour
4646

4747
## Neural Compressor
4848

49-
Dynamic quantization can be used through the Optimum command-line interface:
49+
Dynamic quantization can be used through the Optimum CLI:
5050

5151
```bash
5252
optimum-cli inc quantize --model distilbert-base-cased-distilled-squad --output ./quantized_distilbert
5353
```
5454
Note that quantization is currently only supported for CPUs (only CPU backends are available), so we will not be utilizing GPUs / CUDA in this example.
5555

56-
To load a quantized model hosted locally or on the 🤗 hub, you can do as follows :
57-
```python
58-
from optimum.intel import INCModelForSequenceClassification
59-
60-
model_id = "Intel/distilbert-base-uncased-finetuned-sst-2-english-int8-dynamic"
61-
model = INCModelForSequenceClassification.from_pretrained(model_id)
62-
```
63-
6456
You can load many more quantized models hosted on the hub under the Intel organization [`here`](https://huggingface.co/Intel).
6557

66-
For more details on the supported compression techniques, please refer to the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_inc).
67-
58+
For more details on the supported compression techniques, please refer to the [documentation](https://huggingface.co/docs/optimum-intel/en/neural_compressor/optimization).
6859

6960
## OpenVINO
7061

@@ -75,28 +66,27 @@ Below are examples of how to use OpenVINO and its [NNCF](https://docs.openvino.a
7566
It is also possible to export your model to the [OpenVINO IR](https://docs.openvino.ai/2024/documentation/openvino-ir-format.html) format with the CLI :
7667

7768
```plain
78-
optimum-cli export openvino --model gpt2 ov_model
69+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B ov_llama/
7970
```
8071

8172
You can also apply 8-bit weight-only quantization when exporting your model : the model linear, embedding and convolution weights will be quantized to INT8, the activations will be kept in floating point precision.
8273

8374
```plain
84-
optimum-cli export openvino --model gpt2 --weight-format int8 ov_model
75+
optimum-cli export openvino --model meta-llama/Meta-Llama-3-8B --weight-format int8 ov_llama_int8/
8576
```
8677

8778
Quantization in hybrid mode can be applied to Stable Diffusion pipeline during model export. This involves applying hybrid post-training quantization to the UNet model and weight-only quantization for the rest of the pipeline components. In the hybrid mode, weights in MatMul and Embedding layers are quantized, as well as activations of other layers.
8879

8980
```plain
90-
optimum-cli export openvino --model stabilityai/stable-diffusion-2-1 --dataset conceptual_captions --weight-format int8 ov_model
81+
optimum-cli export openvino --model stabilityai/stable-diffusion-2-1 --dataset conceptual_captions --weight-format int8 ov_model_sd/
9182
```
9283

93-
To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum/main/en/intel/optimization_ov).
84+
To apply quantization on both weights and activations, you can find more information in the [documentation](https://huggingface.co/docs/optimum-intel/en/openvino/optimization).
9485

9586
#### Inference:
9687

9788
To load a model and run inference with OpenVINO Runtime, you can just replace your `AutoModelForXxx` class with the corresponding `OVModelForXxx` class.
9889

99-
10090
```diff
10191
- from transformers import AutoModelForSeq2SeqLM
10292
+ from optimum.intel import OVModelForSeq2SeqLM
@@ -112,50 +102,22 @@ To load a model and run inference with OpenVINO Runtime, you can just replace yo
112102
[{'translation_text': "Il n'est jamais sorti sans un livre sous son bras, et il est souvent revenu avec deux."}]
113103
```
114104

115-
If you want to load a PyTorch checkpoint, set `export=True` to convert your model to the OpenVINO IR.
105+
#### Quantization:
116106

117-
```python
118-
from optimum.intel import OVModelForCausalLM
119-
120-
model = OVModelForCausalLM.from_pretrained("gpt2", export=True)
121-
model.save_pretrained("./ov_model")
122-
```
107+
Post-training static quantization can also be applied. Here is an example on how to apply static quantization on a Whisper model using the [LibriSpeech](https://huggingface.co/datasets/openslr/librispeech_asr) dataset for the calibration step.
123108

109+
```python
110+
from optimum.intel import OVModelForSpeechSeq2Seq, OVQuantizationConfig
124111

125-
#### Post-training static quantization:
126-
127-
Post-training static quantization introduces an additional calibration step where data is fed through the network in order to compute the activations quantization parameters. Here is an example on how to apply static quantization on a fine-tuned DistilBERT.
112+
model_id = "openai/whisper-tiny"
113+
q_config = OVQuantizationConfig(dtype="int8", dataset="librispeech", num_samples=50)
114+
q_model = OVModelForSpeechSeq2Seq.from_pretrained(model_id, quantization_config=q_config)
128115

129-
```python
130-
from functools import partial
131-
from optimum.intel import OVQuantizer, OVModelForSequenceClassification, OVConfig, OVQuantizationConfig
132-
from transformers import AutoTokenizer, AutoModelForSequenceClassification
133-
134-
model_id = "distilbert-base-uncased-finetuned-sst-2-english"
135-
model = OVModelForSequenceClassification.from_pretrained(model_id, export=True)
136-
tokenizer = AutoTokenizer.from_pretrained(model_id)
137-
def preprocess_fn(examples, tokenizer):
138-
return tokenizer(
139-
examples["sentence"], padding=True, truncation=True, max_length=128
140-
)
141-
142-
quantizer = OVQuantizer.from_pretrained(model)
143-
calibration_dataset = quantizer.get_calibration_dataset(
144-
"glue",
145-
dataset_config_name="sst2",
146-
preprocess_function=partial(preprocess_fn, tokenizer=tokenizer),
147-
num_samples=100,
148-
dataset_split="train",
149-
preprocess_batch=True,
150-
)
151116
# The directory where the quantized model will be saved
152117
save_dir = "nncf_results"
153-
# Apply static quantization and save the resulting model in the OpenVINO IR format
154-
ov_config = OVConfig(quantization_config=OVQuantizationConfig())
155-
quantizer.quantize(ov_config=ov_config, calibration_dataset=calibration_dataset, save_directory=save_dir)
156-
# Load the quantized model
157-
optimized_model = OVModelForSequenceClassification.from_pretrained(save_dir)
118+
q_model.save_pretrained(save_dir)
158119
```
120+
You can find more information in the [documentation](https://huggingface.co/docs/optimum-intel/en/openvino/optimization).
159121

160122

161123
## IPEX

docs/source/openvino/optimization.mdx

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1006,7 +1006,6 @@ ov_model = OVModelForSpeechSeq2Seq.from_pretrained(
10061006
num_samples=10,
10071007
dataset="librispeech",
10081008
processor=model_id,
1009-
matmul_sq_alpha=0.95,
10101009
)
10111010
)
10121011
```

0 commit comments

Comments
 (0)