Skip to content

[BUG] Rotation parameter not saved and not used in inference ? #2354

@12345txy

Description

@12345txy

[Bug] : rotation parameter not saved and not used in inference ?

in other words how can I use rotation for correct inference?

Describe the bug

The rotation parameter (for SpinQuant/QuaRot preprocessing) is used during quantization but:

  1. It is NOT saved to quantize_config.json when saving the model
  2. It is NOT used in quantized layer's forward method during inference
  3. This causes incorrect inference results (extremely high perplexity, e.g., PPL: 48564640.0)

GPU Info

NVIDIA GeForce RTX 5090
CUDA Version: 13.0
Driver Version: 580.76.05

Software Info

  • OS: Linux 5.15.0-78-generic
  • Python: 3.12.3
  • gptqmodel: 5.6.12
  • torch: 2.9.0
  • transformers: 4.57.3
  • accelerate: 1.12.0
  • triton: 3.5.0

quantize_config.json

{
  "bits": 2,
  "group_size": 128,
  "desc_act": false,
  "sym": true,
  "quant_method": "gptq",
  "checkpoint_format": "gptq",
  "meta": {
    "gptaq": true,
    "gptaq_alpha": 0.25,
    "act_group_aware": true
  }
}

Note: The rotation field is missing, even though rotation='hadamard' was used during quantization.

To Reproduce

  1. Quantize a model with rotation:
quant_config = QuantizeConfig(..., rotation='hadamard')
model = GPTQModel.from_pretrained(model_path, quantize_config=quant_config)
model.quantize(calibration_data)
model.save(quant_path)
  1. Load for inference:
model = GPTQModel.from_quantized(quant_path)
# Even manually setting rotation doesn't help:
# model.quantize_config.rotation = 'hadamard'
# Because quantized layer's forward() doesn't check this parameter
  1. Result: Incorrect inference (PPL: 48564640.0 instead of normal values)

Expected behavior

  1. rotation parameter should be saved to quantize_config.json
  2. Quantized layer's forward method should check and use rotation parameter
  3. If rotation is set, apply inverse rotation during inference to restore correct outputs is reasonable

Additional context

  • Quantization works correctly (rotation is applied via rotate_model() in base.py:586-610)
  • Issue is in inference: quantized layers (e.g., TritonV2QuantLinear.forward) don't apply inverse rotation
  • Manual setting of rotation after loading doesn't help because forward() doesn't check it
  • This appears to be incomplete implementation of rotation feature?

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions