@@ -39,6 +39,30 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
3939* 📊 ** Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[ tutorial]] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html )
4040
4141
42+
43+ ## 🚀 News
44+
45+ * [ 2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([ News] ( https://tech.china.com.cn/sx/20251201/411376.shtml ) ).
46+ * [ 2025-11] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3 )] Trinity-RFT v0.3.3 released: bug fixes.
47+ * [ 2025-11] Introducing [ Learn-to-Ask] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask ) : a framework for training proactive dialogue agents from offline expert data ([ paper] ( https://arxiv.org/pdf/2510.25441 ) ).
48+ * [ 2025-11] Introducing [ BOTS] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots ) : online RL task selection for efficient LLM fine-tuning ([ paper] ( https://arxiv.org/pdf/2510.26374 ) ).
49+
50+ <details ><summary > more... </summary >
51+ <ul >
52+ <li > [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2)] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.</li >
53+ <li > [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.</li >
54+ <li > [2025-09] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.</li >
55+ <li > [2025-08] Introducing [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).</li >
56+ <li > [2025-08] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.1)] Trinity-RFT v0.2.1 released.</li >
57+ <li > [2025-07] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.0)] Trinity-RFT v0.2.0 released.</li >
58+ <li > [2025-07] Technical report (arXiv v2) updated with new features, examples, and experiments: [link](https://arxiv.org/abs/2505.17826).</li >
59+ <li > [2025-06] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.1)] Trinity-RFT v0.1.1 released.</li >
60+ <li > [2025-05] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.0)] Trinity-RFT v0.1.0 released, plus [technical report](https://arxiv.org/abs/2505.17826).</li >
61+ <li > [2025-04] Trinity-RFT open sourced.</li >
62+ </ul >
63+ </details >
64+
65+
4266## 🔨 Tutorials and Guidelines
4367
4468
@@ -84,39 +108,27 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
84108
85109 <img src =" https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png " alt =" System architecture " width =" 600 " />
86110
87- * ** Comprehensive Algorithm Support:**
88111
89- | Algorithm [ Paper] | Documentation | Key Configurations | Example |
90- | -----------| -----------| ---------------| -----------|
91- | PPO [ Paper] ( https://arxiv.org/pdf/1707.06347 ) | [ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html ) | ` algorithm_type: ppo ` | [ Countdown Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown ) |
92- | GRPO [ Paper] ( https://arxiv.org/pdf/2402.03300 ) | [ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html ) | ` advantage_fn: grpo ` | [ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k ) |
93- | RLOO [ Paper] ( https://arxiv.org/pdf/2402.14740 ) | - | ` advantage_fn: rloo ` | - |
94- | REINFORCE++ [ Paper] ( https://arxiv.org/pdf/2501.03262 ) | - | ` advantage_fn: reinforce ` | - |
95- | CHORD [ 💡 Paper] ( https://arxiv.org/pdf/2508.11408 ) | [ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html ) | - | - | [ ToolACE Example] ( https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml ) |
96- | REC Series [ 💡 Paper] ( https://arxiv.org/pdf/2509.24203 ) | - | ` algorithm_type: rec ` | [ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k ) |
97- | GSPO [ Paper] ( https://arxiv.org/pdf/2507.18071 ) | - | ` algorithm_type: gspo ` | - |
98- | TOPR [ Paper] ( https://arxiv.org/pdf/2503.14286 ) | - | ` algorithm_type: topr ` | [ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k ) |
99- | sPPO | - | ` algorithm_type: sppo ` | [ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k ) |
100- | CISPO [ Paper] ( https://arxiv.org/pdf/2506.13585 ) | - | ` policy_loss_fn: cispo ` | - |
101- | SAPO [ Paper] ( https://arxiv.org/pdf/2511.20347 ) | - | ` policy_loss_fn: sapo ` | - |
102112
113+ ## 🔧 Supported Algorithms
103114
104- ## 🚀 News
115+ We list most algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [ Algorithm module] ( https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py ) . You can also set up new algorithms by customizing different components.
116+
117+ | Algorithm [ Paper] | Doc/Example | Source Code | Key Configurations |
118+ | -----------| -----------| ---------------| -----------|
119+ | PPO [[ Paper] ( https://arxiv.org/pdf/1707.06347 )] | [[ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html )] [[ Countdown Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py )] | <pre ><code >algorithm_type: ppo</code ></pre > |
120+ | GRPO [[ Paper] ( https://arxiv.org/pdf/2402.03300 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k ) [ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py )] | <pre ><code >algorithm_type: grpo</code ></pre > |
121+ | RLOO [[ Paper] ( https://arxiv.org/pdf/2402.14740 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py )] | <pre ><code >policy_loss_fn: ppo<br >advantage_fn: rloo</code ></pre > |
122+ | REINFORCE++ [[ Paper] ( https://arxiv.org/pdf/2501.03262 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py )] | <pre ><code >policy_loss_fn: ppo<br >advantage_fn: reinforce</code ></pre > |
123+ | CHORD 💡 [[ Paper] ( https://arxiv.org/pdf/2508.11408 )] | [[ Docs] ( https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html )] [[ ToolACE Example] ( https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py )] | <pre ><code >algorithm_type: mix_chord</code ></pre > |
124+ | REC Series 💡 [[ Paper] ( https://arxiv.org/pdf/2509.24203 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py )] | <pre ><code >algorithm_type: rec</code ></pre > |
125+ | GSPO [[ Paper] ( https://arxiv.org/pdf/2507.18071 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py )] | <pre ><code >policy_loss_fn: gspo<br >advantage_fn: grpo</code ></pre > |
126+ | TOPR [[ Paper] ( https://arxiv.org/pdf/2503.14286 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py )] | <pre ><code >algorithm_type: topr</code ></pre > |
127+ | sPPO [[ Paper] ( https://arxiv.org/pdf/2108.05828 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py )] | <pre ><code >algorithm_type: sppo</code ></pre > |
128+ | ASYMRE [[ Paper] ( https://arxiv.org/pdf/2506.20520 )] | [[ GSM8K Example] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k )] | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py )] | <pre ><code >algorithm_type: asymre</code ></pre > |
129+ | CISPO [[ Paper] ( https://arxiv.org/pdf/2506.13585 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py )] | <pre ><code >algorithm_type: cispo</code ></pre > |
130+ | SAPO [[ Paper] ( https://arxiv.org/pdf/2511.20347 )] | - | [[ Code] ( https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py )] | <pre ><code >algorithm_type: sapo</code ></pre > |
105131
106- * [ 2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([ News] ( https://tech.china.com.cn/sx/20251201/411376.shtml ) ).
107- * [ 2025-11] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3 )] Trinity-RFT v0.3.3 released: bug fixes.
108- * [ 2025-11] Introducing [ Learn-to-Ask] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask ) : a framework for training proactive dialogue agents from offline expert data ([ paper] ( https://arxiv.org/pdf/2510.25441 ) ).
109- * [ 2025-11] Introducing [ BOTS] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots ) : online RL task selection for efficient LLM fine-tuning ([ paper] ( https://arxiv.org/pdf/2510.26374 ) ).
110- * [ 2025-11] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2 )] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.
111- * [ 2025-10] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1 )] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
112- * [ 2025-09] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0 )] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
113- * [ 2025-08] Introducing [ CHORD] ( https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord ) : dynamic SFT + RL integration for advanced LLM fine-tuning ([ paper] ( https://arxiv.org/pdf/2508.11408 ) ).
114- * [ 2025-08] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.1 )] Trinity-RFT v0.2.1 released.
115- * [ 2025-07] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.0 )] Trinity-RFT v0.2.0 released.
116- * [ 2025-07] Technical report (arXiv v2) updated with new features, examples, and experiments: [ link] ( https://arxiv.org/abs/2505.17826 ) .
117- * [ 2025-06] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.1 )] Trinity-RFT v0.1.1 released.
118- * [ 2025-05] [[ Release Notes] ( https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.0 )] Trinity-RFT v0.1.0 released, plus [ technical report] ( https://arxiv.org/abs/2505.17826 ) .
119- * [ 2025-04] Trinity-RFT open sourced.
120132
121133
122134---
0 commit comments