Skip to content

Commit c228abf

Browse files
committed
apply reviews && rename modelscope.github.io to agentscope-ai.github.io
1 parent 606062d commit c228abf

File tree

24 files changed

+88
-87
lines changed

24 files changed

+88
-87
lines changed

CONTRIBUTING.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Thank you for your interest in Trinity-RFT! Our framework is built on a decouple
44

55
## Where to Contribute
66

7-
Trinity-RFT provides modular interfaces for different technical interests. Please refer to our [Developer Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) for detailed implementation standards:
7+
Trinity-RFT provides modular interfaces for different technical interests. Please refer to our [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html) for detailed implementation standards:
88

99
| Focus Area | Interface/Code Directory | Potential Tasks |
1010
| :--- | :--- | :--- |
@@ -45,6 +45,6 @@ To ensure a smooth review process, please complete the following:
4545
- **Major Changes**: For significant architectural changes or large features, please open an issue first to discuss the design with the maintainers.
4646
- **Documentation**: We highly value improvements to our tutorials, docstrings, and translations.
4747

48-
*For a deep dive into the framework's architecture, please refer to the [Full Doc](https://modelscope.github.io/Trinity-RFT/en/main/index.html).*
48+
*For a deep dive into the framework's architecture, please refer to the [Full Doc](https://agentscope-ai.github.io/Trinity-RFT/en/main/index.html).*
4949

5050
**Thank you for helping us build a better Reinforcement Fine-Tuning framework!**

README.md

Lines changed: 16 additions & 16 deletions
Large diffs are not rendered by default.

README_zh.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[**English Homepage**](https://github.com/modelscope/Trinity-RFT/blob/main/README.md) | [**中文文档**](https://modelscope.github.io/Trinity-RFT/zh/) | [**常见问题**](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/faq.html)
1+
[**English Homepage**](https://github.com/modelscope/Trinity-RFT/blob/main/README.md) | [**中文文档**](https://agentscope-ai.github.io/Trinity-RFT/zh/) | [**常见问题**](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/faq.html)
22

33
<div align="center">
44
<img src="https://img.alicdn.com/imgextra/i1/O1CN01lvLpfw25Pl4ohGZnU_!!6000000007519-2-tps-1628-490.png" alt="Trinity-RFT" style="height: 120px;">
@@ -12,7 +12,7 @@
1212
<div align="center">
1313

1414
[![paper](http://img.shields.io/badge/cs.LG-2505.17826-B31B1B?logo=arxiv&logoColor=red)](https://arxiv.org/abs/2505.17826)
15-
[![doc](https://img.shields.io/badge/Docs-blue?logo=markdown)](https://modelscope.github.io/Trinity-RFT/)
15+
[![doc](https://img.shields.io/badge/Docs-blue?logo=markdown)](https://agentscope-ai.github.io/Trinity-RFT/)
1616
[![pypi](https://img.shields.io/pypi/v/trinity-rft?logo=pypi&color=026cad)](https://pypi.org/project/trinity-rft/)
1717
![license](https://img.shields.io/badge/license-Apache--2.0-000000.svg)
1818

@@ -31,11 +31,11 @@ Trinity-RFT 是一个通用、灵活、用户友好的大语言模型(LLM)
3131

3232
Trinity-RFT 面向不同背景和目标的用户提供相应功能:
3333

34-
* 🤖 **智能体应用开发者:** 训练智能体应用,以增强其在特定领域中完成任务的能力 [[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
34+
* 🤖 **智能体应用开发者:** 训练智能体应用,以增强其在特定领域中完成任务的能力 [[教程]](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)
3535

36-
* 🧠 **强化学习算法研究者:** 通过定制化简洁、可插拔的模块,设计、实现与验证新的强化学习算法 [[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
36+
* 🧠 **强化学习算法研究者:** 通过定制化简洁、可插拔的模块,设计、实现与验证新的强化学习算法 [[教程]](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
3737

38-
* 📊 **数据工程师:** 设计针对任务定制的数据集,构建处理流水线以支持数据清洗、增强以及人类参与场景 [[教程]](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
38+
* 📊 **数据工程师:** 设计针对任务定制的数据集,构建处理流水线以支持数据清洗、增强以及人类参与场景 [[教程]](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_operator.html)
3939

4040

4141

@@ -73,16 +73,16 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
7373

7474
| 类别 | 教程 / 指南 |
7575
| --- | ----|
76-
| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)<br>+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_advanced.html)<br>+ [全异步 RFT](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_async_mode.html)<br>+ [通过 DPO 或 SFT 进行离线学习](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_dpo.html)<br>+ [在无GPU环境下运行RFT训练(Tinker 后端)](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_tinker_backend.html) |
77-
| *多轮智能体强化学习* | + [拼接多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)<br>+ [通用多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)<br>+ [调用智能体框架中的 ReAct 工作流](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html) <br>+ [例子:训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
78-
| *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)<br>+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目:learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 & Human-in-the-loop](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) |
79-
| *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))<br>+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
76+
| *运行各种 RFT 模式* | + [快速开始:在 GSM8k 上运行 GRPO](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)<br>+ [Off-policy RFT](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_advanced.html)<br>+ [全异步 RFT](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_async_mode.html)<br>+ [通过 DPO 或 SFT 进行离线学习](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_dpo.html)<br>+ [在无GPU环境下运行RFT训练(Tinker 后端)](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_tinker_backend.html) |
77+
| *多轮智能体强化学习* | + [拼接多轮任务](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)<br>+ [通用多轮任务](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)<br>+ [调用智能体框架中的 ReAct 工作流](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_react.html) <br>+ [例子:训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
78+
| *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)<br>+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目:learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 & Human-in-the-loop](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) |
79+
| *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: R3L (基于反思-重试的强化学习)](https://github.com/shiweijiezero/R3L) (📝 [论文](https://arxiv.org/abs/2601.03715))<br>+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
8080
| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
81-
| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |
81+
| *深入认识 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |
8282

8383

8484
> [!NOTE]
85-
> 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)
85+
> 更多教程请参考 [Trinity-RFT 文档](https://agentscope-ai.github.io/Trinity-RFT/)
8686
8787

8888

@@ -117,13 +117,13 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
117117

118118
## 🔨 算法支持
119119

120-
下表列出了 Trinity-RFT 支持的算法,更多算法请参考 [算法模块](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py)。您也可以通过自定义不同的模块来构建新算法,参见 [教程](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
120+
下表列出了 Trinity-RFT 支持的算法,更多算法请参考 [算法模块](https://github.com/modelscope/Trinity-RFT/blob/main/trinity/algorithm/algorithm.py)。您也可以通过自定义不同的模块来构建新算法,参见 [教程](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)
121121

122122
| 算法 | 文档/示例 | 核心代码 | 关键配置 |
123123
|:-----------|:-----------|:---------------|:-----------|
124-
| PPO [[论文](https://arxiv.org/pdf/1707.06347)] | [[文档](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)] [[Countdown 例子](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` |
125-
| GRPO [[论文](https://arxiv.org/pdf/2402.03300)] | [[文档](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)] [[GSM8K 例子](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` |
126-
| CHORD 💡 [[论文](https://arxiv.org/pdf/2508.11408)] | [[文档](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)] [[ToolACE 例子](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` |
124+
| PPO [[论文](https://arxiv.org/pdf/1707.06347)] | [[文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)] [[Countdown 例子](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown)] | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/ppo_policy_loss.py)] | `algorithm_type: ppo` |
125+
| GRPO [[论文](https://arxiv.org/pdf/2402.03300)] | [[文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)] [[GSM8K 例子](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k)]| [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/grpo_advantage.py)] | `algorithm_type: grpo` |
126+
| CHORD 💡 [[论文](https://arxiv.org/pdf/2508.11408)] | [[文档](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)] [[ToolACE 例子](https://github.com/modelscope/Trinity-RFT/blob/main/examples/mix_chord/mix_chord_toolace.yaml)] | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/chord_policy_loss.py)] | `algorithm_type: mix_chord` |
127127
| REC Series 💡 [[论文](https://arxiv.org/pdf/2509.24203)] | [[GSM8K 例子](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k)] | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/policy_loss_fn/rec_policy_loss.py)] | `algorithm_type: rec` |
128128
| RLOO [[论文](https://arxiv.org/pdf/2402.14740)] | - | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/rloo_advantage.py)] | `algorithm_type: rloo` |
129129
| REINFORCE++ [[论文](https://arxiv.org/pdf/2501.03262)] | - | [[代码](https://github.com/modelscope/Trinity-RFT/tree/main/trinity/algorithm/advantage_fn/reinforce_advantage.py)] | `algorithm_type: reinforceplusplus` |
@@ -266,7 +266,7 @@ uv pip install trinity-rft
266266
uv pip install flash-attn==2.8.1
267267
```
268268

269-
> 如需使用 **Megatron-LM** 进行训练,请参考 [Megatron-LM 支持](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html)
269+
> 如需使用 **Megatron-LM** 进行训练,请参考 [Megatron-LM 支持](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_megatron.html)
270270
271271

272272
### 第二步:准备数据集和模型
@@ -350,7 +350,7 @@ ray start --address=<master_address>
350350
```
351351

352352
(可选)您可以使用 [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) 等工具,更方便地监控训练流程。
353-
相应的配置方法请参考 [这个文档](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration)
353+
相应的配置方法请参考 [这个文档](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration)
354354
比如使用 Wandb 时,您需要先登录:
355355

356356
```shell

benchmark/reports/alfworld.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The environment is configured as follows:
1010
* Reward Structure: +1 for successfully completing the task, -0.1 otherwise
1111
* Maximum Steps: 30 (configurable via `max_env_steps`)
1212

13-
See the [documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html) for data preparation.
13+
See the [documentation](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html) for data preparation.
1414

1515
## 2. Experimental Settings
1616

0 commit comments

Comments
 (0)