-
Notifications
You must be signed in to change notification settings - Fork 155
Open
Labels
bugSomething isn't workingSomething isn't working
Description
[Bug] : rotation parameter not saved and not used in inference ?
in other words how can I use rotation for correct inference?
Describe the bug
The rotation parameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:
- It is NOT saved to
quantize_config.jsonwhen saving the model - It is NOT used in quantized layer's
forwardmethod during inference - This causes incorrect inference results (extremely high perplexity, e.g., PPL: 48564640.0)
GPU Info
NVIDIA GeForce RTX 5090
CUDA Version: 13.0
Driver Version: 580.76.05
Software Info
- OS: Linux 5.15.0-78-generic
- Python: 3.12.3
- gptqmodel: 5.6.12
- torch: 2.9.0
- transformers: 4.57.3
- accelerate: 1.12.0
- triton: 3.5.0
quantize_config.json
{
"bits": 2,
"group_size": 128,
"desc_act": false,
"sym": true,
"quant_method": "gptq",
"checkpoint_format": "gptq",
"meta": {
"gptaq": true,
"gptaq_alpha": 0.25,
"act_group_aware": true
}
}Note: The rotation field is missing, even though rotation='hadamard' was used during quantization.
To Reproduce
- Quantize a model with rotation:
quant_config = QuantizeConfig(..., rotation='hadamard')
model = GPTQModel.from_pretrained(model_path, quantize_config=quant_config)
model.quantize(calibration_data)
model.save(quant_path)- Load for inference:
model = GPTQModel.from_quantized(quant_path)
# Even manually setting rotation doesn't help:
# model.quantize_config.rotation = 'hadamard'
# Because quantized layer's forward() doesn't check this parameter- Result: Incorrect inference (PPL: 48564640.0 instead of normal values)
Expected behavior
rotationparameter should be saved toquantize_config.json- Quantized layer's
forwardmethod should check and userotationparameter - If rotation is set, apply inverse rotation during inference to restore correct outputs is reasonable
Additional context
- Quantization works correctly (rotation is applied via
rotate_model()inbase.py:586-610) - Issue is in inference: quantized layers (e.g.,
TritonV2QuantLinear.forward) don't apply inverse rotation - Manual setting of
rotationafter loading doesn't help becauseforward()doesn't check it - This appears to be incomplete implementation of rotation feature?
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working