Skip to content

Commit 15fb505

Browse files
committed
fix table
1 parent 057c8e2 commit 15fb505

File tree

1 file changed

+41
-29
lines changed

1 file changed

+41
-29
lines changed

README.md

Lines changed: 41 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,30 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
3939
* 📊 **Data engineers:** Create RFT datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
4040

4141

42+
43+
## 🚀 News
44+
45+
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
46+
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
47+
* [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
48+
* [2025-11] Introducing [BOTS](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
49+
50+
<details><summary> more... </summary>
51+
<ul>
52+
<li> [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2)] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.</li>
53+
<li> [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.</li>
54+
<li> [2025-09] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.</li>
55+
<li> [2025-08] Introducing [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).</li>
56+
<li> [2025-08] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.1)] Trinity-RFT v0.2.1 released.</li>
57+
<li> [2025-07] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.0)] Trinity-RFT v0.2.0 released.</li>
58+
<li> [2025-07] Technical report (arXiv v2) updated with new features, examples, and experiments: [link](https://arxiv.org/abs/2505.17826).</li>
59+
<li> [2025-06] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.1)] Trinity-RFT v0.1.1 released.</li>
60+
<li> [2025-05] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.0)] Trinity-RFT v0.1.0 released, plus [technical report](https://arxiv.org/abs/2505.17826).</li>
61+
<li> [2025-04] Trinity-RFT open sourced.</li>
62+
</ul>
63+
</details>
64+
65+
4266
## 🔨 Tutorials and Guidelines
4367

4468

@@ -84,39 +108,27 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
84108

85109
<img src="https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png" alt="System architecture" width="600" />
86110

87-
* **Comprehensive Algorithm Support:**
88111

89-
| Algorithm [Paper] | Documentation | Key Configurations | Example |
90-
|-----------|-----------|---------------|-----------|
91-
| PPO [Paper](https://arxiv.org/pdf/1707.06347) | [Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) | `algorithm_type: ppo` | [Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown) |
92-
| GRPO [Paper](https://arxiv.org/pdf/2402.03300) | [Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html) | `advantage_fn: grpo` | [GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k) |
93-
| RLOO [Paper](https://arxiv.org/pdf/2402.14740) | - | `advantage_fn: rloo` | - |
94-
| REINFORCE++ [Paper](https://arxiv.org/pdf/2501.03262) | - | `advantage_fn: reinforce` | - |
95-
| CHORD [💡 Paper](https://arxiv.org/pdf/2508.11408) | [Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) | - | - | [ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml) |
96-
| REC Series [💡 Paper](https://arxiv.org/pdf/2509.24203) | - | `algorithm_type: rec` | [GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) |
97-
| GSPO [Paper](https://arxiv.org/pdf/2507.18071) | - | `algorithm_type: gspo` | - |
98-
| TOPR [Paper](https://arxiv.org/pdf/2503.14286) | - | `algorithm_type: topr` | [GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k) |
99-
| sPPO | - | `algorithm_type: sppo` | [GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k) |
100-
| CISPO [Paper](https://arxiv.org/pdf/2506.13585) | - | `policy_loss_fn: cispo` | - |
101-
| SAPO [Paper](https://arxiv.org/pdf/2511.20347) | - | `policy_loss_fn: sapo` | - |
102112

113+
## 🔧 Supported Algorithms
103114

104-
## 🚀 News
115+
We list most algorithms supported by Trinity-RFT in the following table. For more details, the concrete configurations are shown in the [Algorithm module](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py). You can also set up new algorithms by customizing different components.
116+
117+
| Algorithm [Paper] | Doc/Example | Source Code | Key Configurations |
118+
|-----------|-----------|---------------|-----------|
119+
| PPO [[Paper](https://arxiv.org/pdf/1707.06347)] | [[Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)] [[Countdown Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | <pre><code>algorithm_type: ppo</code></pre> |
120+
| GRPO [[Paper](https://arxiv.org/pdf/2402.03300)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k) [Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)]| [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | <pre><code>algorithm_type: grpo</code></pre> |
121+
| RLOO [[Paper](https://arxiv.org/pdf/2402.14740)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | <pre><code>policy_loss_fn: ppo<br>advantage_fn: rloo</code></pre> |
122+
| REINFORCE++ [[Paper](https://arxiv.org/pdf/2501.03262)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | <pre><code>policy_loss_fn: ppo<br>advantage_fn: reinforce</code></pre> |
123+
| CHORD 💡 [[Paper](https://arxiv.org/pdf/2508.11408)] | [[Docs](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)] [[ToolACE Example](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | <pre><code>algorithm_type: mix_chord</code></pre> |
124+
| REC Series 💡 [[Paper](https://arxiv.org/pdf/2509.24203)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | <pre><code>algorithm_type: rec</code></pre> |
125+
| GSPO [[Paper](https://arxiv.org/pdf/2507.18071)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/gspo_policy_loss.py)] | <pre><code>policy_loss_fn: gspo<br>advantage_fn: grpo</code></pre> |
126+
| TOPR [[Paper](https://arxiv.org/pdf/2503.14286)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/topr_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/topr_policy_loss.py)] | <pre><code>algorithm_type: topr</code></pre> |
127+
| sPPO [[Paper](https://arxiv.org/pdf/2108.05828)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/sppo_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sppo_loss_fn.py)] | <pre><code>algorithm_type: sppo</code></pre> |
128+
| ASYMRE [[Paper](https://arxiv.org/pdf/2506.20520)] | [[GSM8K Example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/asymre_gsm8k)] | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/asymre_advantage.py)] | <pre><code>algorithm_type: asymre</code></pre> |
129+
| CISPO [[Paper](https://arxiv.org/pdf/2506.13585)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/cispo_policy_loss.py)] | <pre><code>algorithm_type: cispo</code></pre> |
130+
| SAPO [[Paper](https://arxiv.org/pdf/2511.20347)] | - | [[Code](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/sapo_policy_loss.py)] | <pre><code>algorithm_type: sapo</code></pre> |
105131

106-
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
107-
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
108-
* [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
109-
* [2025-11] Introducing [BOTS](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
110-
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.2)] Trinity-RFT v0.3.2 released: bug fixes and advanced task selection & scheduling.
111-
* [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
112-
* [2025-09] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
113-
* [2025-08] Introducing [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
114-
* [2025-08] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.1)] Trinity-RFT v0.2.1 released.
115-
* [2025-07] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.0)] Trinity-RFT v0.2.0 released.
116-
* [2025-07] Technical report (arXiv v2) updated with new features, examples, and experiments: [link](https://arxiv.org/abs/2505.17826).
117-
* [2025-06] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.1)] Trinity-RFT v0.1.1 released.
118-
* [2025-05] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.1.0)] Trinity-RFT v0.1.0 released, plus [technical report](https://arxiv.org/abs/2505.17826).
119-
* [2025-04] Trinity-RFT open sourced.
120132

121133

122134
---

0 commit comments

Comments
 (0)