Skip to content

Commit 2cc37b1

Browse files
committed
Update main readme for v0.2.1
1 parent de119dc commit 2cc37b1

File tree

1 file changed

+27
-17
lines changed

1 file changed

+27
-17
lines changed

README.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,14 @@
2222

2323
## 🚀 News
2424

25-
* [2025-08] ✨ Trinity-RFT v0.2.1 is released with enhanced features for Agentic RL and Async RL.
25+
26+
<!-- TODO: v0.3.0 -->
2627
* [2025-08] 🎵 We introduce [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord), a dynamic integration of SFT and RL for enhanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
27-
* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
28+
* [2025-08] ✨ Trinity-RFT v0.2.1 is released! Enhanced features include:
29+
* Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) examples.
30+
* Rollout-training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
31+
* [A benchmark tool](./benchmark) for quick verification and experimentation.
32+
* RL algorithms: implement [GSPO](https://github.com/modelscope/Trinity-RFT/pull/154), [AsymRE](https://github.com/modelscope/Trinity-RFT/pull/187), [TOPR, CISPO](https://github.com/modelscope/Trinity-RFT/pull/185), [RAFT](https://github.com/modelscope/Trinity-RFT/pull/174).
2833
* [2025-07] Trinity-RFT v0.2.0 is released.
2934
* [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
3035
* [2025-06] Trinity-RFT v0.1.1 is released.
@@ -45,11 +50,11 @@ It is designed to support diverse application scenarios and serve as a unified p
4550

4651
* **Unified RFT Core:**
4752

48-
Supports *synchronous/asynchronous*, *on-policy/off-policy*, and *online/offline* training. Rollout and training can run separately and scale independently on different devices.
53+
Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently on different devices.
4954

5055
* **First-Class Agent-Environment Interaction:**
5156

52-
Handles lagged feedback, long-tailed latencies, and agent/env failures gracefully. Supports multi-turn agent-env interaction.
57+
Handles lagged feedback, long-tailed latencies, and agent/env failures gracefully. Supports general multi-step agent-env interaction.
5358

5459
* **Optimized Data Pipelines:**
5560

@@ -71,7 +76,7 @@ It is designed to support diverse application scenarios and serve as a unified p
7176

7277

7378
<p align="center">
74-
<img src="https://img.alicdn.com/imgextra/i1/O1CN01BFCZRV1zS9T1PoH49_!!6000000006712-2-tps-922-544.png" alt="Trinity-RFT-core-architecture">
79+
<img src="https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png" alt="Trinity-RFT-core-architecture">
7580
</p>
7681

7782
</details>
@@ -104,6 +109,11 @@ It is designed to support diverse application scenarios and serve as a unified p
104109
<img src="https://img.alicdn.com/imgextra/i3/O1CN01hR1LCh25kpJMKmYR4_!!6000000007565-2-tps-1474-740.png" alt="Trinity-RFT-data-pipeline-buffer">
105110
</p>
106111

112+
A more technical version:
113+
<p align="center">
114+
<img src="https://img.alicdn.com/imgextra/i2/O1CN011q7Chi1luB5tnGY6M_!!6000000004878-2-tps-1444-1002.png" alt="Trinity-RFT-data-pipeline-buffer">
115+
</p>
116+
107117
</details>
108118

109119

@@ -123,12 +133,13 @@ It is designed to support diverse application scenarios and serve as a unified p
123133

124134
* **Adaptation to New Scenarios:**
125135

126-
Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md))
136+
Implement agent-environment interaction logic in a single `Workflow`/`MultiTurnWorkflow`/`RewardPropagationWorkflow` class ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)),
137+
or import existing workflows from agent frameworks like AgentScope ([Example](./docs/sphinx_doc/source/tutorial/example_react.md)).
127138

128139

129140
* **RL Algorithm Development:**
130141

131-
Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md))
142+
Develop custom RL algorithms (loss design, sampling strategy, data processing) in compact, plug-and-play classes ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)).
132143

133144

134145
* **Low-Code Usage:**
@@ -341,14 +352,11 @@ Tutorials for running different RFT modes:
341352
+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
342353

343354

344-
Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
345-
346-
+ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
355+
Tutorials for adapting Trinity-RFT to multi-step agentic scenarios:
347356

348-
Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:
349-
350-
+ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
351-
+ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)
357+
+ [Concatenated multi-turn workflow](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
358+
+ [General multi-step workflow](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
359+
+ [ReAct workflow, imported directly from an agent framework](./docs/sphinx_doc/source/tutorial/example_react.md)
352360

353361

354362
Tutorials for data-related functionalities:
@@ -361,15 +369,17 @@ Tutorials for RL algorithm development/research with Trinity-RFT:
361369
+ [RL algorithm development with Trinity-RFT](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)
362370

363371

364-
Guidelines for full configurations: see [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
372+
Guidelines for full configurations:
373+
374+
+ See [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
365375

366376

367377
Guidelines for developers and researchers:
368378

369379
+ [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
370380
+ [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
371-
372-
381+
+ [Develop new data operators](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.html#operators-for-data-developers)
382+
+ [Understand the coordination between explorer and trainer](./docs/sphinx_doc/source/tutorial/synchronizer.html)
373383

374384

375385

0 commit comments

Comments
 (0)