Skip to content

Commit 433275e

Browse files
improve accelerate reference in docs (#1086)
* improve accelerate reference in docs * Apply suggestions from code review Co-authored-by: Marc Sun <[email protected]> * fix spelling --------- Co-authored-by: Marc Sun <[email protected]>
1 parent a03df43 commit 433275e

File tree

1 file changed

+34
-6
lines changed

1 file changed

+34
-6
lines changed

docs/source/integrations.mdx

Lines changed: 34 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
# Transformers
22

3-
With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with bitsandbytes primitives.
3+
With Transformers it's very easy to load any model in 4 or 8-bit, quantizing them on the fly with `bitsandbytes` primitives.
44

5-
Please review the [bitsandbytes section in the Transformers docs](https://huggingface.co/docs/transformers/v4.37.2/en/quantization#bitsandbytes).
5+
Please review the [`bitsandbytes` section in the Transformers docs](https://huggingface.co/docs/transformers/main/en/quantization#bitsandbytes).
66

77
Details about the BitsAndBytesConfig can be found [here](https://huggingface.co/docs/transformers/v4.37.2/en/main_classes/quantization#transformers.BitsAndBytesConfig).
88

@@ -25,9 +25,37 @@ Please review the [bitsandbytes section in the PEFT docs](https://huggingface.co
2525

2626
# Accelerate
2727

28-
Bitsandbytes is also easily usable from within Accelerate.
28+
Bitsandbytes is also easily usable from within Accelerate, where you can quantize any PyTorch model simply by passing a quantization config; e.g:
2929

30-
Please review the [bitsandbytes section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization).
30+
```py
31+
from accelerate import init_empty_weights
32+
from accelerate.utils import BnbQuantizationConfig, load_and_quantize_model
33+
from mingpt.model import GPT
34+
35+
model_config = GPT.get_default_config()
36+
model_config.model_type = 'gpt2-xl'
37+
model_config.vocab_size = 50257
38+
model_config.block_size = 1024
39+
40+
with init_empty_weights():
41+
empty_model = GPT(model_config)
42+
43+
bnb_quantization_config = BnbQuantizationConfig(
44+
load_in_4bit=True,
45+
bnb_4bit_compute_dtype=torch.bfloat16, # optional
46+
bnb_4bit_use_double_quant=True, # optional
47+
bnb_4bit_quant_type="nf4" # optional
48+
)
49+
50+
quantized_model = load_and_quantize_model(
51+
empty_model,
52+
weights_location=weights_location,
53+
bnb_quantization_config=bnb_quantization_config,
54+
device_map = "auto"
55+
)
56+
```
57+
58+
For further details, e.g. model saving, cpu-offloading andfine-tuning, please review the [`bitsandbytes` section in the Accelerate docs](https://huggingface.co/docs/accelerate/en/usage_guides/quantization).
3159

3260

3361

@@ -59,5 +87,5 @@ e.g. for transformers state that you can load any model in 8-bit / 4-bit precisi
5987

6088
# Blog posts
6189

62-
- [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
63-
- [A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and bitsandbytes](https://huggingface.co/blog/hf-bitsandbytes-integration)
90+
- [Making LLMs even more accessible with `bitsandbytes`, 4-bit quantization and QLoRA](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
91+
- [A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using Hugging Face Transformers, Accelerate and `bitsandbytes`](https://huggingface.co/blog/hf-bitsandbytes-integration)

0 commit comments

Comments
 (0)