You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/GPTQ/README.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ For generative LLMs, very often the bottleneck of inference is no longer the com
7
7
8
8
-[FMS Model Optimizer requirements](../../README.md#requirements)
9
9
-`gptqmodel` is needed for this example. Use `pip install gptqmodel` or [install from source](https://github.com/ModelCloud/GPTQModel/tree/main?tab=readme-ov-file)
10
-
- It is advised to install from source if you plan to use GPTQv2
10
+
- It is advised to install from source if you plan to use `GPTQv2`
11
11
- Optionally for the evaluation section below, install [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness)
12
12
```
13
13
pip install lm-eval
@@ -86,29 +86,29 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
@@ -120,7 +120,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
120
120
121
121
## Code Walk-through
122
122
123
-
1. Command line arguments will be used to create a GPTQ quantization config. Information about the required arguments and their default values can be found [here](../../fms_mo/training_args.py). GPTQv1 is supported by default. To use GPTQv2, set the parameter `v2` to `True` and `v2_memory_device` to `cpu`.
123
+
1. Command line arguments will be used to create a GPTQ quantization config. Information about the required arguments and their default values can be found [here](../../fms_mo/training_args.py). `GPTQv1` is supported by default. To use `GPTQv2`, set the parameter `v2` to `True` and `v2_memory_device` to `cpu`.
124
124
125
125
```python
126
126
from gptqmodel import GPTQModel, QuantizeConfig
@@ -172,4 +172,4 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
172
172
tokenizer.save_pretrained(output_dir) # optional
173
173
```
174
174
> [!NOTE]
175
-
> 1. GPTQ of a 70B model usually takes ~4-10 hours on A100 with GPTQv1.
175
+
> 1. GPTQ of a 70B model usually takes ~4-10 hours on A100 with `GPTQv1`.
0 commit comments