Skip to content

Commit 5b434ed

Browse files
Fix: Correct device placement for QuantizedLinear across all quantizers
Extends the previous fix for AWQ to GPTQ and GGUF quantizers. Addresses an AttributeError where QuantizedLinear (an nn.Module) was incorrectly passed to `move_to_device`, a function expecting a tensor. This change ensures QuantizedLinear modules are moved to their target device using the correct `.to(device)` method in AWQ, GPTQ, and GGUF quantizers. This commit ensures consistent and correct device handling for quantized layers created by these methods.
1 parent bfb5167 commit 5b434ed

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

quantllm/quant/gguf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -232,7 +232,7 @@ def _quantize_layer(
232232
chunk_size = 1024 # Adjust based on available memory
233233

234234

235-
quantized = move_to_device(quantized, target_device)
235+
quantized = quantized.to(target_device)
236236

237237
# Copy bias if exists
238238
if layer.bias is not None:

quantllm/quant/gptq.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ def _quantize_layer(self, layer: nn.Linear, H: torch.Tensor) -> QuantizedLinear:
202202
calibration="gptq"
203203
)
204204
)
205-
quantized = move_to_device(quantized, target_device)
205+
quantized = quantized.to(target_device)
206206

207207
if layer.bias is not None:
208208
# layer is already on target_device

0 commit comments

Comments
 (0)