Skip to content

Commit 0a1a8e3

Browse files
faaanystevhliu
andauthored
[docs] no hard coding cuda as bnb has multi-backend support (#35867)
* change cuda to DEVICE * Update docs/source/en/llm_tutorial.md Co-authored-by: Steven Liu <[email protected]> --------- Co-authored-by: Steven Liu <[email protected]>
1 parent 9dc1efa commit 0a1a8e3

File tree

1 file changed

+11
-8
lines changed

1 file changed

+11
-8
lines changed

docs/source/en/llm_tutorial.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ Before you begin, make sure you have all the necessary libraries installed:
4040
```bash
4141
pip install transformers bitsandbytes>=0.39.0 -q
4242
```
43+
Bitsandbytes supports multiple backends in addition to CUDA-based GPUs. Refer to the multi-backend installation [guide](https://huggingface.co/docs/bitsandbytes/main/en/installation#multi-backend) to learn more.
4344

4445

4546
## Generate text
@@ -101,9 +102,11 @@ Next, you need to preprocess your text input with a [tokenizer](tokenizer_summar
101102

102103
```py
103104
>>> from transformers import AutoTokenizer
105+
>>> from accelerate.test_utils.testing import get_backend
104106

107+
>>> DEVICE, _, _ = get_backend() # automatically detects the underlying device type (CUDA, CPU, XPU, MPS, etc.)
105108
>>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1", padding_side="left")
106-
>>> model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to("cuda")
109+
>>> model_inputs = tokenizer(["A list of colors: red, blue"], return_tensors="pt").to(DEVICE)
107110
```
108111

109112
The `model_inputs` variable holds the tokenized text input, as well as the attention mask. While [`~generation.GenerationMixin.generate`] does its best effort to infer the attention mask when it is not passed, we recommend passing it whenever possible for optimal results.
@@ -122,7 +125,7 @@ Finally, you don't need to do it one sequence at a time! You can batch your inpu
122125
>>> tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
123126
>>> model_inputs = tokenizer(
124127
... ["A list of colors: red, blue", "Portugal is"], return_tensors="pt", padding=True
125-
... ).to("cuda")
128+
... ).to(DEVICE)
126129
>>> generated_ids = model.generate(**model_inputs)
127130
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)
128131
['A list of colors: red, blue, green, yellow, orange, purple, pink,',
@@ -152,7 +155,7 @@ If not specified in the [`~generation.GenerationConfig`] file, `generate` return
152155

153156

154157
```py
155-
>>> model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to("cuda")
158+
>>> model_inputs = tokenizer(["A sequence of numbers: 1, 2"], return_tensors="pt").to(DEVICE)
156159

157160
>>> # By default, the output will contain up to 20 tokens
158161
>>> generated_ids = model.generate(**model_inputs)
@@ -174,7 +177,7 @@ By default, and unless specified in the [`~generation.GenerationConfig`] file, `
174177
>>> from transformers import set_seed
175178
>>> set_seed(42)
176179

177-
>>> model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to("cuda")
180+
>>> model_inputs = tokenizer(["I am a cat."], return_tensors="pt").to(DEVICE)
178181

179182
>>> # LLM + greedy decoding = repetitive, boring output
180183
>>> generated_ids = model.generate(**model_inputs)
@@ -196,7 +199,7 @@ LLMs are [decoder-only](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt
196199
>>> # which is shorter, has padding on the right side. Generation fails to capture the logic.
197200
>>> model_inputs = tokenizer(
198201
... ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
199-
... ).to("cuda")
202+
... ).to(DEVICE)
200203
>>> generated_ids = model.generate(**model_inputs)
201204
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
202205
'1, 2, 33333333333'
@@ -206,7 +209,7 @@ LLMs are [decoder-only](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt
206209
>>> tokenizer.pad_token = tokenizer.eos_token # Most LLMs don't have a pad token by default
207210
>>> model_inputs = tokenizer(
208211
... ["1, 2, 3", "A, B, C, D, E"], padding=True, return_tensors="pt"
209-
... ).to("cuda")
212+
... ).to(DEVICE)
210213
>>> generated_ids = model.generate(**model_inputs)
211214
>>> tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
212215
'1, 2, 3, 4, 5, 6,'
@@ -223,7 +226,7 @@ Some models and tasks expect a certain input prompt format to work properly. Whe
223226
... )
224227
>>> set_seed(0)
225228
>>> prompt = """How many helicopters can a human eat in one sitting? Reply as a thug."""
226-
>>> model_inputs = tokenizer([prompt], return_tensors="pt").to("cuda")
229+
>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(DEVICE)
227230
>>> input_length = model_inputs.input_ids.shape[1]
228231
>>> generated_ids = model.generate(**model_inputs, max_new_tokens=20)
229232
>>> print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])
@@ -239,7 +242,7 @@ Some models and tasks expect a certain input prompt format to work properly. Whe
239242
... },
240243
... {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
241244
... ]
242-
>>> model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to("cuda")
245+
>>> model_inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(DEVICE)
243246
>>> input_length = model_inputs.shape[1]
244247
>>> generated_ids = model.generate(model_inputs, do_sample=True, max_new_tokens=20)
245248
>>> print(tokenizer.batch_decode(generated_ids[:, input_length:], skip_special_tokens=True)[0])

0 commit comments

Comments
 (0)