We introduce INSPO, a novel Instruction-Policy Co-Evolution framework that integrates instruction optimization as a dynamic component of the reinforcement learning (RL) loop for training large language models.
If you find INSPO useful for your research and applications, please give us a star. We will release the scripts in the near future.
