Skip to content

Commit 345e4be

Browse files
meenchenhthadicherla
authored andcommitted
AWQ support for vllm fake quant (dense model) (#409)
Signed-off-by: weimingc <[email protected]> Signed-off-by: Hrishith Thadicherla <[email protected]>
1 parent bc54694 commit 345e4be

File tree

1 file changed

+7
-1
lines changed
  • modelopt/torch/quantization/plugins

1 file changed

+7
-1
lines changed

modelopt/torch/quantization/plugins/vllm.py

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,13 @@ def apply(
6161
x = layer.input_quantizer(x)
6262
if layer.weight_quantizer.is_enabled:
6363
original_weight = layer.weight
64-
layer.weight = layer.weight_quantizer(layer.weight)
64+
quantized_tensor = layer.weight_quantizer(layer.weight)
65+
# parameterize the quantized weight
66+
if isinstance(original_weight, torch.nn.Parameter):
67+
quantized_tensor = torch.nn.Parameter(
68+
quantized_tensor, requires_grad=original_weight.requires_grad
69+
)
70+
layer.weight = quantized_tensor
6571
output = self.quant_method.apply(layer, x, bias)
6672
layer.weight = original_weight
6773
else:

0 commit comments

Comments
 (0)