You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+27-17Lines changed: 27 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,9 +22,14 @@
22
22
23
23
## 🚀 News
24
24
25
-
*[2025-08] ✨ Trinity-RFT v0.2.1 is released with enhanced features for Agentic RL and Async RL.
25
+
26
+
<!-- TODO: v0.3.0 -->
26
27
*[2025-08] 🎵 We introduce [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord), a dynamic integration of SFT and RL for enhanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
27
-
*[2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
28
+
*[2025-08] ✨ Trinity-RFT v0.2.1 is released! Enhanced features include:
29
+
* Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) examples.
30
+
* Rollout-training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
31
+
*[A benchmark tool](./benchmark) for quick verification and experimentation.
*[2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
30
35
*[2025-06] Trinity-RFT v0.1.1 is released.
@@ -45,11 +50,11 @@ It is designed to support diverse application scenarios and serve as a unified p
45
50
46
51
***Unified RFT Core:**
47
52
48
-
Supports *synchronous/asynchronous*, *on-policy/off-policy*, and *online/offline* training. Rollout and training can run separately and scale independently on different devices.
53
+
Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently on different devices.
@@ -123,12 +133,13 @@ It is designed to support diverse application scenarios and serve as a unified p
123
133
124
134
***Adaptation to New Scenarios:**
125
135
126
-
Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md))
136
+
Implement agent-environment interaction logic in a single `Workflow`/`MultiTurnWorkflow`/`RewardPropagationWorkflow` class ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)),
137
+
or import existing workflows from agent frameworks like AgentScope ([Example](./docs/sphinx_doc/source/tutorial/example_react.md)).
127
138
128
139
129
140
***RL Algorithm Development:**
130
141
131
-
Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md))
142
+
Develop custom RL algorithms (loss design, sampling strategy, data processing) in compact, plug-and-play classes([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)).
132
143
133
144
134
145
***Low-Code Usage:**
@@ -341,14 +352,11 @@ Tutorials for running different RFT modes:
341
352
+[Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
342
353
343
354
344
-
Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
0 commit comments