You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
One-shot algorithm that uses calibration data to select the ideal bin for weight quantization.
42
42
This algorithm is applied on top of the basic quantization algorithm, and affects weights only.
43
43
The implementation is based on [GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers](https://arxiv.org/pdf/2210.17323). The algorithm is very similar to SparseGPT: A small amount of calibration data is used
0 commit comments