Skip to content

Commit cc114a3

Browse files
Update overview-quantization-transformers.md (#1480)
1 parent 440773a commit cc114a3

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

overview-quantization-transformers.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,12 @@ In this section, we will go over the pros and cons of bitsandbytes and gptq quan
6767

6868
**AMD support**: The integration should work out of the box for AMD GPUs!
6969

70-
### What are the known limiations of bitsandbytes?
70+
### What are the known limitations of bitsandbytes?
7171
**slower than GPTQ for text generation**: bitsandbytes 4-bit models are slow compared to GPTQ when using [`generate`](https://huggingface.co/docs/transformers/main_classes/text_generation).
7272

7373
**4-bit weights are not serializable**: Currently, 4-bit models cannot be serialized. This is a frequent community request, and we believe it should be addressed very soon by the bitsandbytes maintainers as it's in their roadmap!
7474

75-
### What are the known limiations of autoGPTQ?
75+
### What are the known limitations of autoGPTQ?
7676
**calibration dataset**: The need of a calibration dataset might discourage some users to go for GPTQ. Furthermore, it can take several hours to quantize the model (e.g. 4 GPU hours for a 175B scale model [according to the paper](https://arxiv.org/pdf/2210.17323.pdf) - section 2)
7777

7878
**works only for language models (for now)**: As of today, the API for quantizing a model with auto-GPTQ has been designed to support only language models. It should be possible to quantize non-text (or multimodal) models using the GPTQ algorithm, but the process has not been elaborated in the original paper or in the auto-gptq repository. If the community is excited about this topic this might be considered in the future.

0 commit comments

Comments
 (0)