Skip to content

Commit 26330a4

Browse files
committed
fix: corrected errors in PR
Signed-off-by: omobayode.fagbohungbe <[email protected]>
1 parent a84f526 commit 26330a4

File tree

2 files changed

+15
-12
lines changed

2 files changed

+15
-12
lines changed

examples/GPTQ/README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ For generative LLMs, very often the bottleneck of inference is no longer the com
77

88
- [FMS Model Optimizer requirements](../../README.md#requirements)
99
- `gptqmodel` is needed for this example. Use `pip install gptqmodel` or [install from source](https://github.com/ModelCloud/GPTQModel/tree/main?tab=readme-ov-file)
10-
- It is advised to install from source if you plan to use GPTQv2
10+
- It is advised to install from source if you plan to use `GPTQv2`
1111
- Optionally for the evaluation section below, install [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)
1212
```
1313
pip install lm-eval
@@ -86,29 +86,29 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
8686
## Example Test Results
8787
8888
- Unquantized Model
89-
-
90-
|Model | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
91-
|------------|--------------|------:|------|-----:|----------|---|-----:|---|-----:|
92-
| LLAMA3-8B |lambada_openai| 1|none | 5|acc |↑ |0.7103|± |0.0063|
93-
| | | |none | 5|perplexity|↓ |3.7915|± |0.0727|
89+
90+
|Model | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
91+
|------------|--------------|------:|------|-----:|----------|---|-----:|---|-----:|
92+
| LLAMA3-8B |lambada_openai| 1|none | 5|acc |↑ |0.7103|± |0.0063|
93+
| | | |none | 5|perplexity|↓ |3.7915|± |0.0727|
9494
9595
- Quantized model with the settings showed above (`desc_act` default to False.)
96-
- GPTQv1
96+
- `GPTQv1`
9797
9898
|Model | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
9999
|------------|--------------|------:|------|-----:|----------|---|------:|---|-----:|
100100
| LLAMA3-8B |lambada_openai| 1|none | 5|acc |↑ |0.6365 |± |0.0067|
101101
| | | |none | 5|perplexity|↓ |5.9307 |± |0.1830|
102102
103-
- GPTQv2
103+
- `GPTQv2`
104104
105105
|Model | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
106106
|------------|--------------|------:|------|-----:|----------|---|------:|---|-----:|
107107
| LLAMA3-8B |lambada_openai| 1|none | 5|acc |↑ |0.6817 |± |0.0065|
108108
| | | |none | 5|perplexity|↓ |4.3994 |± |0.0995|
109109
110110
- Quantized model with `desc_act` set to `True` (could improve the model quality, but at the cost of inference speed.)
111-
- GPTQv1
111+
- `GPTQv1`
112112
|Model | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
113113
|------------|--------------|------:|------|-----:|----------|---|------:|---|-----:|
114114
| LLAMA3-8B |lambada_openai| 1|none | 5|acc |↑ |0.6193 |± |0.0068|
@@ -120,7 +120,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
120120
121121
## Code Walk-through
122122
123-
1. Command line arguments will be used to create a GPTQ quantization config. Information about the required arguments and their default values can be found [here](../../fms_mo/training_args.py). GPTQv1 is supported by default. To use GPTQv2, set the parameter `v2` to `True` and `v2_memory_device` to `cpu`.
123+
1. Command line arguments will be used to create a GPTQ quantization config. Information about the required arguments and their default values can be found [here](../../fms_mo/training_args.py). `GPTQv1` is supported by default. To use `GPTQv2`, set the parameter `v2` to `True` and `v2_memory_device` to `cpu`.
124124
125125
```python
126126
from gptqmodel import GPTQModel, QuantizeConfig
@@ -172,4 +172,4 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
172172
tokenizer.save_pretrained(output_dir) # optional
173173
```
174174
> [!NOTE]
175-
> 1. GPTQ of a 70B model usually takes ~4-10 hours on A100 with GPTQv1.
175+
> 1. GPTQ of a 70B model usually takes ~4-10 hours on A100 with `GPTQv1`.

fms_mo/training_args.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -207,7 +207,10 @@ class GPTQArguments(TypeChecker):
207207
autotune_warmup_after_quantized: bool = False
208208
cache_examples_on_gpu: bool = True
209209
use_version2: bool = False
210-
v2_mem_device: Optional[str] = field(default="cpu", metadata={"choices": ["auto", "cpu", "cuda"]})
210+
v2_mem_device: Optional[str] = field(
211+
default="cpu", metadata={"choices": ["auto", "cpu", "cuda"]}
212+
)
213+
211214

212215

213216
@dataclass

0 commit comments

Comments
 (0)