Skip to content

Commit 713a0b1

Browse files
authored
Fix typos (#354)
1 parent cc6795d commit 713a0b1

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ See `python generate.py --help` for more options.
9898
You can also use GPTQ-style int4 quantization, but this needs conversions of the weights first:
9999

100100
```bash
101-
python quantize/gptq.py --output_path checkpoints/lit-llama/7B/llama-gptq.4bit.pt --dtype bfloat16 --quantize gptq.int4
101+
python quantize/gptq.py --output_path checkpoints/lit-llama/7B/llama-gptq.4bit.pth --dtype bfloat16 --quantize gptq.int4
102102
```
103103

104104
GPTQ-style int4 quantization brings GPU usage down to about ~5GB. As only the weights of the Linear layers are quantized, it is useful to also use `--dtype bfloat16` even with the quantization enabled.

howto/inference.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ See `python generate.py --help` for more options.
3131
You can also use GPTQ-style int4 quantization, but this needs conversions of the weights first:
3232

3333
```bash
34-
python quantize/gptq.py --output_path checkpoints/lit-llama/7B/llama-gptq.4bit.pt --dtype bfloat16 --quantize gptq.int4
34+
python quantize/gptq.py --output_path checkpoints/lit-llama/7B/llama-gptq.4bit.pth --dtype bfloat16 --quantize gptq.int4
3535
```
3636

3737
GPTQ-style int4 quantization brings GPU usage down to about ~5GB. As only the weights of the Linear layers are quantized, it is useful to also use `--dtype bfloat16` even with the quantization enabled.

0 commit comments

Comments
 (0)