Skip to content

Commit 91b349b

Browse files
committed
update examples 2
Signed-off-by: Kyle Sayers <[email protected]>
1 parent fbf2a6d commit 91b349b

File tree

11 files changed

+16
-49
lines changed

11 files changed

+16
-49
lines changed

examples/awq/README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,7 @@ recipe = [
1818
To use your own model, start with an existing example change the `model_id` to match your own model stub.
1919
```python
2020
model_id = "path/to/your/model"
21-
model = AutoModelForCausalLM.from_pretrained(
22-
model_id,
23-
device_map="auto",
24-
torch_dtype="auto",
25-
)
21+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
2622
```
2723

2824
## Adding Mappings ##

examples/awq/llama_example.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,7 @@
77
# Select model and load it.
88
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
99

10-
model = AutoModelForCausalLM.from_pretrained(
11-
MODEL_ID, device_map="auto", torch_dtype="auto"
12-
)
10+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
1311
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
1412

1513
# Select calibration dataset.

examples/awq/qwen3_moe_example.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,7 @@
88
# Select model and load it.
99
MODEL_ID = "Qwen/Qwen3-30B-A3B"
1010

11-
model = AutoModelForCausalLM.from_pretrained(
12-
MODEL_ID, device_map="auto", torch_dtype="auto"
13-
)
11+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
1412
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
1513

1614
# Select calibration dataset.

examples/multimodal_audio/README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,7 @@ This directory contains example scripts for quantizing a variety of audio langua
2121
To use your own multimodal modal, start with an existing example change the `model_id` to match your own model stub.
2222
```python3
2323
model_id = "path/to/your/model"
24-
model = AutoModelForCausalLM.from_pretrained(
25-
model_id,
26-
device_map="auto",
27-
torch_dtype="auto",
28-
)
24+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
2925
```
3026

3127
## Customizing GPTQModifier Parameters ##

examples/multimodal_vision/README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,7 @@ This directory contains example scripts for quantizing a variety of vision-langu
2525
To use your own multimodal modal, start with an existing example change the `model_id` to match your own model stub.
2626
```python3
2727
model_id = "path/to/your/model"
28-
model = AutoModelForCausalLM.from_pretrained(
29-
model_id,
30-
device_map="auto",
31-
torch_dtype="auto",
32-
)
28+
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
3329
```
3430

3531
## Customizing GPTQModifier Parameters ##

examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,11 @@
33
from transformers import AutoModelForCausalLM, AutoTokenizer
44

55
from llmcompressor import oneshot, train
6+
from llmcompressor.utils.dev import dispatch_for_generation
67

78
# load the model in as bfloat16 to save on memory and compute
89
model_stub = "neuralmagic/Llama-2-7b-ultrachat200k"
9-
model = AutoModelForCausalLM.from_pretrained(
10-
model_stub, torch_dtype=torch.bfloat16, device_map="auto"
11-
)
10+
model = AutoModelForCausalLM.from_pretrained(model_stub, torch_dtype=torch.bfloat16)
1211
tokenizer = AutoTokenizer.from_pretrained(model_stub)
1312

1413
# uses LLM Compressor's built-in preprocessing for ultra chat
@@ -71,6 +70,7 @@
7170
)
7271

7372
# Sparse finetune
73+
dispatch_for_generation(model)
7474
finetune_applied_model = train(
7575
model=oneshot_applied_model,
7676
**oneshot_kwargs,
@@ -79,6 +79,7 @@
7979
)
8080

8181
# Oneshot quantization
82+
model.to("cpu")
8283
quantized_model = oneshot(
8384
model=finetune_applied_model,
8485
**oneshot_kwargs,

examples/quantization_kv_cache/README.md

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -39,11 +39,7 @@ Load the model using `AutoModelForCausalLM`:
3939
from transformers import AutoModelForCausalLM, AutoTokenizer
4040

4141
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
42-
model = AutoModelForCausalLM.from_pretrained(
43-
MODEL_ID,
44-
device_map="auto",
45-
torch_dtype="auto",
46-
)
42+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
4743
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
4844
```
4945

examples/quantization_w4a16/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,7 @@ Load the model using `AutoModelForCausalLM` for handling quantized saving and lo
4040
from transformers import AutoTokenizer, AutoModelForCausalLM
4141

4242
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
43-
model = AutoModelForCausalLM.from_pretrained(
44-
MODEL_ID, device_map="auto", torch_dtype="auto",
45-
)
43+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
4644
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
4745
```
4846

examples/quantization_w8a8_fp8/README.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,8 +38,7 @@ from transformers import AutoTokenizer, AutoModelForCausalLM
3838

3939
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
4040

41-
model = AutoModelForCausalLM.from_pretrained(
42-
MODEL_ID, device_map="auto", torch_dtype="auto")
41+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
4342
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
4443
```
4544

examples/quantization_w8a8_int8/README.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,9 +38,7 @@ Load the model using `AutoModelForCausalLM` for handling quantized saving and lo
3838
from transformers import AutoTokenizer, AutoModelForCausalLM
3939

4040
MODEL_ID = "meta-llama/Meta-Llama-3-8B-Instruct"
41-
model = AutoModelForCausalLM.from_pretrained(
42-
MODEL_ID, device_map="auto", torch_dtype="auto",
43-
)
41+
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto")
4442
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
4543
```
4644

0 commit comments

Comments
 (0)