Fix: Correct device placement for QuantizedLinear across all quantizers

google-labs-jules[bot] · google-labs-jules[bot] · commit 5b434ed6bb7b · 2025-05-25T07:49:04.000Z
Extends the previous fix for AWQ to GPTQ and GGUF quantizers.
Addresses an AttributeError where QuantizedLinear (an nn.Module)
was incorrectly passed to `move_to_device`, a function expecting
a tensor. This change ensures QuantizedLinear modules are moved to
their target device using the correct `.to(device)` method in
AWQ, GPTQ, and GGUF quantizers.

This commit ensures consistent and correct device handling for
quantized layers created by these methods.
diff --git a/quantllm/quant/gguf.py b/quantllm/quant/gguf.py
@@ -232,7 +232,7 @@ def _quantize_layer(
         chunk_size = 1024  # Adjust based on available memory
         
         
-        quantized = move_to_device(quantized, target_device)
+        quantized = quantized.to(target_device)
 
         # Copy bias if exists
         if layer.bias is not None:
diff --git a/quantllm/quant/gptq.py b/quantllm/quant/gptq.py
@@ -202,7 +202,7 @@ def _quantize_layer(self, layer: nn.Linear, H: torch.Tensor) -> QuantizedLinear:
                 calibration="gptq"
             )
         )
-        quantized = move_to_device(quantized, target_device)
+        quantized = quantized.to(target_device)
         
         if layer.bias is not None:
             # layer is already on target_device

Original file line number	Diff line number	Diff line change
`@@ -202,7 +202,7 @@ def _quantize_layer(self, layer: nn.Linear, H: torch.Tensor) -> QuantizedLinear:`
`202`	`202`	`calibration="gptq"`
`203`	`203`	`)`
`204`	`204`	`)`
`205`		`- quantized = move_to_device(quantized, target_device)`
	`205`	`+ quantized = quantized.to(target_device)`
`206`	`206`
`207`	`207`	`if layer.bias is not None:`
`208`	`208`	`# layer is already on target_device`