Official implementation of CGPO (ICLR 2026): Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization.
Note: This repository is under active development. We have uploaded the core codebase, while documentation, training scripts, and reproducibility instructions are still being organized and will be updated soon.
- 2026: CGPO accepted to ICLR 2026 (The Fourteenth International Conference on Learning Representations).
If you find this repository useful, please cite:
@inproceedings{liang2026boosting,
title={Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization},
author={Liang, Xize and Yang, Lin and Wang, Jie and Liu, Rui and Lu, Yang and Zeng, Jinliang and Chen, Hanzhu and Li, Dong and Hao, Jianye},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026}
}