You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.
649
+
Don't forget to provide a dataset since it is required for the calibration procedure during full quantization.
650
+
651
+
652
+
## Pipeline Quantization
653
+
654
+
There are multimodal pipelines that consist of multiple components, such as Stable Diffusion or Visual Language models. In these cases, there may be a need to apply different quantization methods to different components of the pipeline. For example, you may want to apply int4 data-aware weight-only quantization to a language model in visual-language pipeline, while applying int8 weight-only quantization to other components. In this case you can use the `OVPipelineQuantizationConfig` class to specify the quantization configuration for each component of the pipeline.
655
+
656
+
For example, the code below quantizes weights and activations of a language model inside InternVL2-1B, compresses weights of a text embedding model and skips any quantization for vision embedding model.
657
+
658
+
```python
659
+
from optimum.intel import OVModelForVisualCausalLM
660
+
from optimum.intel import OVPipelineQuantizationConfig, OVQuantizationConfig, OVWeightQuantizationConfig
0 commit comments