You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* improve accelerate reference in docs
* Apply suggestions from code review
Co-authored-by: Marc Sun <[email protected]>
* fix spelling
---------
Co-authored-by: Marc Sun <[email protected]>
Copy file name to clipboardExpand all lines: docs/source/integrations.mdx
+34-6Lines changed: 34 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,8 @@
1
1
# Transformers
2
2
3
-
With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with bitsandbytes primitives.
3
+
With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with `bitsandbytes` primitives.
4
4
5
-
Please review the [bitsandbytes section in the Transformers docs](https://huggingface.co/docs/transformers/v4.37.2/en/quantization#bitsandbytes).
5
+
Please review the [`bitsandbytes` section in the Transformers docs](https://huggingface.co/docs/transformers/main/en/quantization#bitsandbytes).
6
6
7
7
Details about the BitsAndBytesConfig can be found [here](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/quantization#transformers.BitsAndBytesConfig).
8
8
@@ -25,9 +25,37 @@ Please review the [bitsandbytes section in the PEFT docs](https://huggingface.co
25
25
26
26
# Accelerate
27
27
28
-
Bitsandbytes is also easily usable from within Accelerate.
28
+
Bitsandbytes is also easily usable from within Accelerate, where you can quantize any PyTorch model simply by passing a quantization config; e.g:
29
29
30
-
Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization).
30
+
```py
31
+
from accelerate import init_empty_weights
32
+
from accelerate.utils import BnbQuantizationConfig, load_and_quantize_model
33
+
from mingpt.model importGPT
34
+
35
+
model_config =GPT.get_default_config()
36
+
model_config.model_type ='gpt2-xl'
37
+
model_config.vocab_size =50257
38
+
model_config.block_size =1024
39
+
40
+
with init_empty_weights():
41
+
empty_model = GPT(model_config)
42
+
43
+
bnb_quantization_config = BnbQuantizationConfig(
44
+
load_in_4bit=True,
45
+
bnb_4bit_compute_dtype=torch.bfloat16, # optional
46
+
bnb_4bit_use_double_quant=True, # optional
47
+
bnb_4bit_quant_type="nf4"# optional
48
+
)
49
+
50
+
quantized_model = load_and_quantize_model(
51
+
empty_model,
52
+
weights_location=weights_location,
53
+
bnb_quantization_config=bnb_quantization_config,
54
+
device_map="auto"
55
+
)
56
+
```
57
+
58
+
For further details, e.g. model saving, cpu-offloading andfine-tuning, please review the [`bitsandbytes` section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization).
31
59
32
60
33
61
@@ -59,5 +87,5 @@ e.g. for transformers state that you can load any model in 8-bit / 4-bit precisi
59
87
60
88
# Blog posts
61
89
62
-
-[Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
63
-
-[A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes](https://huggingface.co/blog/hf-bitsandbytes-integration)
90
+
-[Making LLMs even more accessible with `bitsandbytes`, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
91
+
-[A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and `bitsandbytes`](https://huggingface.co/blog/hf-bitsandbytes-integration)
0 commit comments