You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/openvino/optimization.mdx
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,6 +101,7 @@ Quality of 4-bit weight compressed model can further be improved by employing on
101
101
***AWQ** which stands for Activation Aware Quantization is an algorithm that tunes model weights for more accurate 4-bit compression. It slightly improves generation quality of compressed LLMs, but requires significant additional time and memory for tuning weights on a calibration dataset. Please note that it is possible that there will be no matching patterns in the model to apply AWQ, in such case it will be skipped.
102
102
***Scale Estimation** is a method that tunes quantization scales to minimize the `L2` error between the original and compressed layers. Providing a dataset is required to run scale estimation. Using this method also incurs additional time and memory overhead.
103
103
***GPTQ** optimizes compressed weights in a layer-wise fashion to minimize the difference between activations of a compressed and original layer.
104
+
***LoRA Correction** mitigates quantization noise introduced during weight compression by leveraging low-rank adaptation.
104
105
105
106
Data-aware algorithms can be applied together or separately. For that, provide corresponding arguments to the 4-bit `OVWeightQuantizationConfig` together with a dataset. For example:
0 commit comments