-
Notifications
You must be signed in to change notification settings - Fork 483
Open
Description
Feature Request
I would like to request support for ParoQuant (Pairwise Rotation Quantization) in MLX-LM.
Background
ParoQuant is a new post-training quantization method introduced at ICLR 2026. It uses pairwise rotation to suppress outliers in weight distributions, making it especially effective for reasoning-heavy LLMs. Compared to standard PTQ, it achieves better accuracy retention while still reducing memory and compute costs.
Why MLX-LM?
MLX-LM already supports uniform, mixed-bit, and affine quantization. Adding ParoQuant would:
- Improve robustness for reasoning-focused models.
- Enable developers to experiment with cutting-edge quantization methods on Apple Silicon.
- Keep MLX aligned with the latest research in efficient LLM deployment.
Suggested Implementation
- Integrate ParoQuant’s pairwise rotation preprocessing step into MLX’s quantization pipeline.
- Provide options for INT4/INT8 precision.
- Allow exporting/importing ParoQuant-quantized weights for Hugging Face compatibility.
References
- ParoQuant paper (ICLR 2026): Liang, Chen, Han, Liu
- GitHub repo: z-lab/paroquant
Thanks for considering this request!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels