You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: overview-quantization-transformers.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -32,7 +32,7 @@ Note also that the details shared below are only valid for `PyTorch` models, thi
32
32
## Table of contents
33
33
34
34
-[Resources](#resources)
35
-
-[Comparing bitsandbyes and auto-gptq](#Comparing-bitsandbyes-and-auto-gptq)
35
+
-[Comparing bitsandbytes and auto-gptq](#Comparing-bitsandbytes-and-auto-gptq)
36
36
-[Diving into speed benchmarks](#Diving-into-speed-benchmarks)
37
37
-[Conclusion and final words](#conclusion-and-final-words)
38
38
-[Acknowledgements](#acknowledgements)
@@ -47,7 +47,7 @@ Note also that the details shared below are only valid for `PyTorch` models, thi
47
47
-[Merve's blogpost on quantization](https://huggingface.co/blog/merve/quantization) - This blogpost provides a gentle introduction to quantization and the quantization methods supported natively in transformers.
48
48
49
49
50
-
## Comparing bitsandbyes and auto-gptq
50
+
## Comparing bitsandbytes and auto-gptq
51
51
In this section, we will go over the pros and cons of bitsandbytes and gptq quantization. Note that these are based on the feedback from the community and they can evolve over time as some of these features are in the roadmap of the respective libraries.
52
52
53
53
### What are the benefits of bitsandbytes?
@@ -67,12 +67,12 @@ In this section, we will go over the pros and cons of bitsandbytes and gptq quan
67
67
68
68
**AMD support**: The integration should work out of the box for AMD GPUs!
69
69
70
-
### What are the known limitations of bitsandbytes?
70
+
### What are the potential rooms of improvements of bitsandbytes?
71
71
**slower than GPTQ for text generation**: bitsandbytes 4-bit models are slow compared to GPTQ when using [`generate`](https://huggingface.co/docs/transformers/main_classes/text_generation).
72
72
73
73
**4-bit weights are not serializable**: Currently, 4-bit models cannot be serialized. This is a frequent community request, and we believe it should be addressed very soon by the bitsandbytes maintainers as it's in their roadmap!
74
74
75
-
### What are the known limitations of autoGPTQ?
75
+
### What are the potential rooms of improvements of autoGPTQ?
76
76
**calibration dataset**: The need of a calibration dataset might discourage some users to go for GPTQ. Furthermore, it can take several hours to quantize the model (e.g. 4 GPU hours for a 175B scale model [according to the paper](https://arxiv.org/pdf/2210.17323.pdf) - section 2)
77
77
78
78
**works only for language models (for now)**: As of today, the API for quantizing a model with auto-GPTQ has been designed to support only language models. It should be possible to quantize non-text (or multimodal) models using the GPTQ algorithm, but the process has not been elaborated in the original paper or in the auto-gptq repository. If the community is excited about this topic this might be considered in the future.
0 commit comments