You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|**[Reasoning](examples/countdown/)**| Countdown numbers game with custom rewards |[Training Curve](/examples/countdown/countdown_training_curve.png)|
80
-
|**[Search Agent](examples/search-agent/)**| An agent with end-to-end reasoning, search, browsing, and summarization capabilities |[ASearcher Repo](https://github.com/inclusionAI/ASearcher)|
81
-
|**[Tool-Integrated Reasoning](examples/tir/)**| An agent that can invoke tools during reasoning |[TIR Example](https://github.com/inclusionAI/AReaL/tree/main/examples/tir)|
82
-
|**[RLHF](examples/alignment/)**| RLHF for LLM Alignment |[RLHF Example](https://github.com/inclusionAI/AReaL/tree/main/examples/alignment)|
|**[Math](examples/math/)**| Mathematical problem solving (SFT, GRPO, or PPO) | TBA |
77
+
|**[Multi-Turn Math](examples/multi-turn-math/)**| Iterative mathematical problem solving with self-correction |[Training Curve](examples/multi-turn-math/reward_curve.png)|
78
+
|**[LoRA Math](examples/lora/)**| Math Agent Trained With LoRA | TBA |
|**[Reasoning](examples/countdown/)**| Countdown numbers game with custom rewards |[Training Curve](/examples/countdown/countdown_training_curve.png)|
81
+
|**[Search Agent](examples/search-agent/)**| An agent with end-to-end reasoning, search, browsing, and summarization capabilities |[ASearcher Repo](https://github.com/inclusionAI/ASearcher)|
82
+
|**[Tool-Integrated Reasoning](examples/tir/)**| An agent that can invoke tools during reasoning |[TIR Example](https://github.com/inclusionAI/AReaL/tree/main/examples/tir)|
83
+
|**[RLHF](examples/alignment/)**| RLHF for LLM Alignment |[RLHF Example](https://github.com/inclusionAI/AReaL/tree/main/examples/alignment)|
0 commit comments