Skip to content

MIRALab-USTC/CGPO

Repository files navigation

CGPO

Official implementation of CGPO (ICLR 2026): Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization.

Note: This repository is under active development. We have uploaded the core codebase, while documentation, training scripts, and reproducibility instructions are still being organized and will be updated soon.


News

  • 2026: CGPO accepted to ICLR 2026 (The Fourteenth International Conference on Learning Representations).

Citation

If you find this repository useful, please cite:

@inproceedings{liang2026boosting,
  title={Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization},
  author={Liang, Xize and Yang, Lin and Wang, Jie and Liu, Rui and Lu, Yang and Zeng, Jinliang and Chen, Hanzhu and Li, Dong and Hao, Jianye},
  booktitle={The Fourteenth International Conference on Learning Representations},
  year={2026}
}

About

Official implementation of CGPO (ICLR 2026): Boosting Multi-Domain Reasoning of LLMs via Curvature-Guided Policy Optimization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages