Add ParoQuant (Pairwise Rotation Quantization) Support to MLX-LM

### Feature Request
I would like to request support for ParoQuant (Pairwise Rotation Quantization) in MLX-LM.

### Background
ParoQuant is a new post-training quantization method introduced at ICLR 2026. It uses pairwise rotation to suppress outliers in weight distributions, making it especially effective for reasoning-heavy LLMs. Compared to standard PTQ, it achieves better accuracy retention while still reducing memory and compute costs.

### Why MLX-LM?
MLX-LM already supports uniform, mixed-bit, and affine quantization. Adding ParoQuant would:
- Improve robustness for reasoning-focused models.
- Enable developers to experiment with cutting-edge quantization methods on Apple Silicon.
- Keep MLX aligned with the latest research in efficient LLM deployment.

### Suggested Implementation
- Integrate ParoQuant’s pairwise rotation preprocessing step into MLX’s quantization pipeline.
- Provide options for INT4/INT8 precision.
- Allow exporting/importing ParoQuant-quantized weights for Hugging Face compatibility.

### References
- ParoQuant paper (ICLR 2026): Liang, Chen, Han, Liu
- GitHub repo: z-lab/paroquant

Thanks for considering this request!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ParoQuant (Pairwise Rotation Quantization) Support to MLX-LM #977

Feature Request

Background

Why MLX-LM?

Suggested Implementation

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add ParoQuant (Pairwise Rotation Quantization) Support to MLX-LM #977

Description

Feature Request

Background

Why MLX-LM?

Suggested Implementation

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions