Update main readme for v0.2.1

yanxi-chen · yanxi-chen · commit 2cc37b13d419 · 2025-08-22T11:09:45.000+08:00
diff --git a/README.md b/README.md
@@ -22,9 +22,14 @@
 
 ## 🚀 News
 
-* [2025-08] ✨ Trinity-RFT v0.2.1 is released with enhanced features for Agentic RL and Async RL.
+
+<!-- TODO: v0.3.0 -->
 * [2025-08] 🎵 We introduce [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord), a dynamic integration of SFT and RL for enhanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
-* [2025-08] We now support training on general multi-step workflows! Please check out examples for [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md).
+* [2025-08] ✨ Trinity-RFT v0.2.1 is released! Enhanced features include:
+  * Agentic RL: support training with general multi-step agentic workflows; check out the [ALFWorld](./docs/sphinx_doc/source/tutorial/example_step_wise.md) and [ReAct](./docs/sphinx_doc/source/tutorial/example_react.md) examples.
+  * Rollout-training scheduling: introduce Scheduler, [Synchronizer](./docs/sphinx_doc/source/tutorial/synchronizer.html) and priority queue buffer, which facilitates more efficient and dynamic scheduling of the RFT process.
+  * [A benchmark tool](./benchmark) for quick verification and experimentation.
+  * RL algorithms: implement [GSPO](https://github.com/modelscope/Trinity-RFT/pull/154), [AsymRE](https://github.com/modelscope/Trinity-RFT/pull/187), [TOPR, CISPO](https://github.com/modelscope/Trinity-RFT/pull/185), [RAFT](https://github.com/modelscope/Trinity-RFT/pull/174).
 * [2025-07] Trinity-RFT v0.2.0 is released.
 * [2025-07] We update the [technical report](https://arxiv.org/abs/2505.17826) (arXiv v2) with new features, examples, and experiments.
 * [2025-06] Trinity-RFT v0.1.1 is released.
@@ -45,11 +50,11 @@ It is designed to support diverse application scenarios and serve as a unified p
 
 * **Unified RFT Core:**
 
-  Supports *synchronous/asynchronous*, *on-policy/off-policy*, and *online/offline* training. Rollout and training can run separately and scale independently on different devices.
+  Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently on different devices.
 
 * **First-Class Agent-Environment Interaction:**
 
-  Handles lagged feedback, long-tailed latencies, and agent/env failures gracefully. Supports multi-turn agent-env interaction.
+  Handles lagged feedback, long-tailed latencies, and agent/env failures gracefully. Supports general multi-step agent-env interaction.
 
 * **Optimized Data Pipelines:**
 
@@ -71,7 +76,7 @@ It is designed to support diverse application scenarios and serve as a unified p
 
 
 <p align="center">
-  <img src="https://img.alicdn.com/imgextra/i1/O1CN01BFCZRV1zS9T1PoH49_!!6000000006712-2-tps-922-544.png" alt="Trinity-RFT-core-architecture">
+  <img src="https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png" alt="Trinity-RFT-core-architecture">
 </p>
 
 </details>
@@ -104,6 +109,11 @@ It is designed to support diverse application scenarios and serve as a unified p
   <img src="https://img.alicdn.com/imgextra/i3/O1CN01hR1LCh25kpJMKmYR4_!!6000000007565-2-tps-1474-740.png" alt="Trinity-RFT-data-pipeline-buffer">
 </p>
 
+A more technical version:
+<p align="center">
+  <img src="https://img.alicdn.com/imgextra/i2/O1CN011q7Chi1luB5tnGY6M_!!6000000004878-2-tps-1444-1002.png" alt="Trinity-RFT-data-pipeline-buffer">
+</p>
+
 </details>
 
 
@@ -123,12 +133,13 @@ It is designed to support diverse application scenarios and serve as a unified p
 
 * **Adaptation to New Scenarios:**
 
-  Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class.  ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md))
+  Implement agent-environment interaction logic in a single `Workflow`/`MultiTurnWorkflow`/`RewardPropagationWorkflow` class ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)),
+  or import existing workflows from agent frameworks like AgentScope ([Example](./docs/sphinx_doc/source/tutorial/example_react.md)).
 
 
 * **RL Algorithm Development:**
 
-  Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes.  ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md))
+  Develop custom RL algorithms (loss design, sampling strategy, data processing) in compact, plug-and-play classes ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)).
 
 
 * **Low-Code Usage:**
@@ -341,14 +352,11 @@ Tutorials for running different RFT modes:
 + [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
 
 
-Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
-
-+ [Concatenated Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
+Tutorials for adapting Trinity-RFT to multi-step agentic scenarios:
 
-Tutorials for adapting Trinity-RFT to a general multi-step agentic scenario:
-
-+ [General Multi-Step tasks](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
-+ [ReAct agent tasks](./docs/sphinx_doc/source/tutorial/example_react.md)
++ [Concatenated multi-turn workflow](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
++ [General multi-step workflow](./docs/sphinx_doc/source/tutorial/example_step_wise.md)
++ [ReAct workflow, imported directly from an agent framework](./docs/sphinx_doc/source/tutorial/example_react.md)
 
 
 Tutorials for data-related functionalities:
@@ -361,15 +369,17 @@ Tutorials for RL algorithm development/research with Trinity-RFT:
 + [RL algorithm development with Trinity-RFT](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)
 
 
-Guidelines for full configurations: see [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
+Guidelines for full configurations: 
+
++ See [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
 
 
 Guidelines for developers and researchers:
 
 + [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
 + [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
-
-
++ [Develop new data operators](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.html#operators-for-data-developers)
++ [Understand the coordination between explorer and trainer](./docs/sphinx_doc/source/tutorial/synchronizer.html)