You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
model = OVModelForCausalLM.from_pretrained(model_id, quantization_config=quantization_config)
894
+
model = OVModelForCausalLM.from_pretrained(model_id, quantization_config={"bits": 4})
896
895
```
897
896
897
+
For some models, we provide preconfigured 4-bit weight-only quantization [configurations](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/configuration.py) that offer a good trade-off between quality and speed. This default 4-bit configuration is applied automatically when you specify `quantization_config={"bits": 4}`.
898
+
898
899
Or for vision-language pipelines:
899
900
```python
900
901
model = OVModelForVisualCausalLM.from_pretrained(
901
902
"llava-hf/llava-v1.6-mistral-7b-hf",
902
-
quantization_config=quantization_config
903
+
quantization_config={"bits": 4}
903
904
)
904
905
```
905
906
906
907
You can tune quantization parameters to achieve a better performance accuracy trade-off as follows:
907
908
908
909
```python
910
+
from optimum.intel import OVWeightQuantizationConfig
0 commit comments