Skip to content

Commit f2c6534

Browse files
committed
shape fix for gptq
Summary: aligns with previous shape fixes (#152) Test Plan: export MODEL_REPO=meta-llama/Llama-2-7b-chat-hf python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4-gptq --calibration_tasks wikitext --calibration_limit 10 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4-gptq.g32.cuda.pth --tasks wikitext wikitext: {'word_perplexity,none': 12.4647656874071, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.6028703940149458, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6806577757911142, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} python quantize.py --checkpoint_path checkpoints/$MODEL_REPO/model.pth --mode int4 python eval.py --checkpoint_path checkpoints/$MODEL_REPO/model_int4.g32.cuda.pth --tasks wikitext wikitext: {'word_perplexity,none': 12.639992147818221, 'word_perplexity_stderr,none': 'N/A', 'byte_perplexity,none': 1.6070602521912754, 'byte_perplexity_stderr,none': 'N/A', 'bits_per_byte,none': 0.6844240198082908, 'bits_per_byte_stderr,none': 'N/A', 'alias': 'wikitext'} Reviewers: Subscribers: Tasks: Tags:
1 parent 095b222 commit f2c6534

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

quantize.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -458,7 +458,10 @@ def __init__(self, mod, groupsize=128, inner_k_tiles=8, padding=True):
458458
# we need to do the padding here, both for q and the qparams if necessary
459459
def make_names_and_values_dict_func(q, qparams):
460460
k = q.shape[1]
461-
new_k = find_multiple(k, 1024)
461+
if not _check_linear_int4_k(k, groupsize, inner_k_tiles):
462+
new_k = find_multiple(k, 1024)
463+
else:
464+
new_k = k
462465
# how much we need to pad the weight
463466
delta_k = new_k - q.shape[1]
464467
final_q = torch.ops.aten._convert_weight_to_int4pack(F.pad(q, pad=(0, delta_k)), inner_k_tiles)

0 commit comments

Comments
 (0)