Skip to content

Commit 54b40e1

Browse files
Introduce OVPipelineQuantizationConfig (#1310)
* Initial commit * Tweaks * Fix test * Fix config for phi4mm * Update ignored scope * Update docs * Revert disabling tests * Update optimization.mdx * Update test_quantization.py * Revert "Update test_quantization.py" This reverts commit 1ae7374. * Fix test * Add support for VLM ptq * Update docs * Update docs * Rename pipeline_quantization_configs * Also rename in docs * Update phi4mm config
1 parent 08e3008 commit 54b40e1

File tree

10 files changed

+722
-216
lines changed

10 files changed

+722
-216
lines changed

docs/source/openvino/optimization.mdx

Lines changed: 40 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -133,8 +133,18 @@ Click on a ✅ to copy the command/code for the corresponding optimization case.
133133
</td>
134134
<td style="text-align: center; vertical-align: middle;">–</td>
135135
<td style="text-align: center; vertical-align: middle;">–</td>
136-
<td style="text-align: center; vertical-align: middle;">–</td>
137-
<td style="text-align: center; vertical-align: middle;">–</td>
136+
<td style="text-align: center; vertical-align: middle;">
137+
<button
138+
onclick="navigator.clipboard.writeText('optimum-cli export openvino --task image-text-to-text -m OpenGVLab/InternVL2-1B --trust-remote-code --quant-mode int8 --dataset contextual ./save_dir')">
139+
140+
</button>
141+
</td>
142+
<td style="text-align: center; vertical-align: middle;">
143+
<button
144+
onclick="navigator.clipboard.writeText('OVModelForVisualCausalLM.from_pretrained(\'OpenGVLab/InternVL2-1B\', trust_remote_code=True, quantization_config=OVQuantizationConfig(bits=8, dataset=\'contextual\', trust_remote_code=True)).save_pretrained(\'save_dir\')')">
145+
146+
</button>
147+
</td>
138148
<td style="text-align: center; vertical-align: middle;">–</td>
139149
<td style="text-align: center; vertical-align: middle;">–</td>
140150
</tr>
@@ -636,4 +646,31 @@ To apply mixed quantization through CLI, the `--quant-mode` argument should be u
636646
optimum-cli export openvino -m TinyLlama/TinyLlama-1.1B-Chat-v1.0 --quant-mode nf4_f8e4m3 --dataset wikitext2 ./save_dir
637647
```
638648

639-
Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.
649+
Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.
650+
651+
652+
## Pipeline Quantization
653+
654+
There are multimodal pipelines that consist of multiple components, such as Stable Diffusion or Visual Language models. In these cases, there may be a need to apply different quantization methods to different components of the pipeline. For example, you may want to apply int4 data-aware weight-only quantization to a language model in visual-language pipeline, while applying int8 weight-only quantization to other components. In this case you can use the `OVPipelineQuantizationConfig` class to specify the quantization configuration for each component of the pipeline.
655+
656+
For example, the code below quantizes weights and activations of a language model inside InternVL2-1B, compresses weights of a text embedding model and skips any quantization for vision embedding model.
657+
658+
```python
659+
from optimum.intel import OVModelForVisualCausalLM
660+
from optimum.intel import OVPipelineQuantizationConfig, OVQuantizationConfig, OVWeightQuantizationConfig
661+
662+
model_id = "OpenGVLab/InternVL2-1B"
663+
model = OVModelForVisualCausalLM.from_pretrained(
664+
model_id,
665+
export=True,
666+
trust_remote_code=True,
667+
quantization_config=OVPipelineQuantizationConfig(
668+
quantization_configs={
669+
"lm_model": OVQuantizationConfig(bits=8),
670+
"text_embeddings_model": OVWeightQuantizationConfig(bits=8),
671+
},
672+
dataset="contextual",
673+
trust_remote_code=True,
674+
)
675+
)
676+
```

optimum/commands/export/openvino.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -410,8 +410,7 @@ def run(self):
410410
}
411411
else:
412412
quantization_config = prepare_q_config(self.args)
413-
if quantization_config.get("dataset", None):
414-
quantization_config["trust_remote_code"] = self.args.trust_remote_code
413+
quantization_config["trust_remote_code"] = self.args.trust_remote_code
415414
ov_config = OVConfig(quantization_config=quantization_config)
416415

417416
quantization_config = ov_config.quantization_config if ov_config else None

optimum/intel/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@
8181
[
8282
"OVQuantizer",
8383
"OVCalibrationDataset",
84+
"OVPipelineQuantizationConfig",
8485
"OVQuantizationConfig",
8586
"OVWeightQuantizationConfig",
8687
"OVDynamicQuantizationConfig",
@@ -92,6 +93,7 @@
9293
[
9394
"OVQuantizer",
9495
"OVCalibrationDataset",
96+
"OVPipelineQuantizationConfig",
9597
"OVQuantizationConfig",
9698
"OVWeightQuantizationConfig",
9799
"OVDynamicQuantizationConfig",
@@ -275,6 +277,7 @@
275277
OVCalibrationDataset,
276278
OVDynamicQuantizationConfig,
277279
OVMixedQuantizationConfig,
280+
OVPipelineQuantizationConfig,
278281
OVQuantizationConfig,
279282
OVQuantizer,
280283
OVWeightQuantizationConfig,
@@ -284,6 +287,7 @@
284287
OVCalibrationDataset,
285288
OVDynamicQuantizationConfig,
286289
OVMixedQuantizationConfig,
290+
OVPipelineQuantizationConfig,
287291
OVQuantizationConfig,
288292
OVQuantizer,
289293
OVWeightQuantizationConfig,

optimum/intel/openvino/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
OVConfig,
5555
OVDynamicQuantizationConfig,
5656
OVMixedQuantizationConfig,
57+
OVPipelineQuantizationConfig,
5758
OVQuantizationConfig,
5859
OVWeightQuantizationConfig,
5960
)

0 commit comments

Comments
 (0)