Skip to content

Commit 8a1c1c0

Browse files
authored
Update Readme and docs (#508)
1 parent c3d356c commit 8a1c1c0

File tree

7 files changed

+77
-33
lines changed

7 files changed

+77
-33
lines changed

README.md

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
3939
* [2026-01] Introducing [R3L](https://github.com/shiweijiezero/R3L): a systematic reflect-then-retry RL mechanism with efficient language-guided exploration and stable off-policy learning ([paper](https://arxiv.org/abs/2601.03715)).
4040
* [2025-12] [[Release Notes]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 released: added [Tinker](https://thinkingmachines.ai/tinker/) backend for users **without GPUs**, add more benchmarks, enhance online RL and more.
4141
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
42-
* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
42+
* [2025-11] Introducing [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
4343
* [2025-11] Introducing [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots): online RL task selection for efficient LLM fine-tuning ([paper](https://arxiv.org/pdf/2510.26374)).
4444
* [2025-09] [Our paper](https://arxiv.org/pdf/2509.24203) reveals a novel off-policy interpretation for group-relative REINFORCE and its variants like GRPO and AsymRE ([implementation](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k)).
4545
* [2025-08] Introducing [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
@@ -70,6 +70,15 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
7070
| *Benchmarks* |[Benchmark toolkit (quick verification & experimentation)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/README.md)<br>• [Guru-Math benchmark & comparison with veRL](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>• [FrozenLake benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>• [Alfworld benchmark & comparison with rLLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
7171
| *Going deeper into Trinity-RFT* |[Full configurations](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>• [GPU resource and training configuration guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>• [Training VLM](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>• [Understand the coordination between explorer and trainer](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>• [How to align configuration with veRL](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
7272

73+
> [!TIP]
74+
> **Recommended Learning Paths**
75+
>
76+
> 🆕 **New users:** [Installation](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_installation.html)[Quick Start (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)[Configuration Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)[GPU Resource Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
77+
>
78+
> 🔬 **Algorithm researchers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html)[Algorithm Development Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_algorithm.html)[CHORD Algorithm Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
79+
>
80+
> 🤖 **Agent developers:** [Developer Guide](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_overview.html)[Workflow Development](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/develop_workflow.html)[General Multi-step Workflow Example](https://agentscope-ai.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
81+
7382
> [!NOTE]
7483
> For more tutorials, please refer to the [Trinity-RFT documentation](https://agentscope-ai.github.io/Trinity-RFT/).
7584
@@ -366,12 +375,12 @@ For studio users, click "Run" in the web interface.
366375

367376
## Contribution Guide
368377

369-
This project is currently under active development, and we welcome contributions from the community!
378+
This project is currently under active development--star the repo to watch releases for the latest updates!
370379

371-
We welcome contributions of all kinds, including:
380+
We welcome all kinds of contributions from the community, including:
372381

373382
* Documentation improvements
374-
* Example workflows
383+
* Example workflows, algorithms, and data pipelines
375384
* Bug fixes and performance optimizations
376385

377386
If you're new to the project, documentation and example updates are a great place to start.

README_zh.md

Lines changed: 55 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,9 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
4747
* [2026-01] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.1) Trinity-RFT v0.4.1 发布:升级 verl 至 v0.7.0,Tinker 后端支持 OpenAI API,修复若干 Bug。
4848
* [2026-01] 推出 [R3L](https://github.com/shiweijiezero/R3L):基于反思-重试的强化学习机制,由自然语言反馈引导高效探索,并达成稳定的 off-policy 学习([论文](https://arxiv.org/abs/2601.03715))。
4949
* [2025-12] [[发布说明]](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.4.0) Trinity-RFT v0.4.0 发布:新增[Tinker](https://thinkingmachines.ai/tinker/) 后端以支持在 **无 GPU** 的设备上训练,增加更多基准测试,增强在线 RL 等功能。
50-
* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
5150
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
5251
* [2025-11] [[发布说明](https://github.com/agentscope-ai/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
53-
* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441).
52+
* [2025-11] 推出 [Learn-to-Ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)
5453
* [2025-11] 推出 [BOTS](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots):在线 RL 任务选择,实现高效 LLM 微调([论文](https://arxiv.org/pdf/2510.26374))。
5554
* [2025-09] 我们的 [论文](https://arxiv.org/pdf/2509.24203) 揭示了 group-relative REINFORCE 及其变种(如 GRPO 和 AsymRE)的 off-policy 解释([代码](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k))。
5655
* [2025-08] 推出 [CHORD](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/mix_chord):动态 SFT + RL 集成,实现进阶 LLM 微调([论文](https://arxiv.org/pdf/2508.11408))。
@@ -84,6 +83,15 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
8483
| *深入了解 Trinity-RFT* | + [完整配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [训练多模态模型](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_vlm)<br>+ [理解 explorer-trainer 同步逻辑](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |
8584

8685

86+
> [!TIP]
87+
> **推荐阅读顺序**
88+
>
89+
> 🆕 **新手入门:** [安装](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_installation.html)[快速开始 (GSM8K)](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_reasoning_basic.html)[参数配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)[GPU 资源配置指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
90+
>
91+
> 🔬 **算法研究者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html)[算法开发指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_algorithm.html)[CHORD 算法示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html)
92+
>
93+
> 🤖 **Agent 开发者:** [开发者指南](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_overview.html)[Workflow 开发](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/develop_workflow.html)[通用多轮 Workflow 示例](https://agentscope-ai.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)
94+
8795
> [!NOTE]
8896
> 更多教程请参考 [Trinity-RFT 文档](https://agentscope-ai.github.io/Trinity-RFT/)
8997
@@ -149,6 +157,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
149157

150158

151159
- [快速上手](#快速上手)
160+
- [使用 CPU 快速上手](#使用-cpu-快速上手)
152161
- [第一步:安装](#第一步安装)
153162
- [第二步:准备数据集和模型](#第二步准备数据集和模型)
154163
- [第三步:准备配置文件](#第三步准备配置文件)
@@ -161,14 +170,31 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
161170

162171
## 快速上手
163172

164-
165173
> [!NOTE]
166174
> 本项目正处于活跃开发阶段。欢迎提出意见和建议!
167-
>
168-
> **没有 GPU?没问题!** 您仍然可以尝试使用:
169-
> 1. 按照安装步骤进行操作(可跳过 `flash-attn` 等 GPU 专用的软件包)
170-
> 2. 运行 **[Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。
171175
176+
### 使用 CPU 快速上手
177+
178+
如果您没有 GPU,仍然可以通过 Tinker 后端体验 Trinity-RFT。
179+
180+
```bash
181+
# 创建并激活环境
182+
python3.10 -m venv .venv
183+
source .venv/bin/activate
184+
185+
# 安装支持仅 CPU 后端的 Trinity-RFT
186+
pip install -e ".[tinker]"
187+
```
188+
189+
运行一个简单示例:
190+
191+
```bash
192+
trinity run --config examples/tinker/tinker.yaml
193+
```
194+
195+
该示例专为仅使用 CPU 的设备设计。更多细节请参见完整的 [Tinker 训练示例](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker)
196+
197+
如需在 GPU 设备上运行 Trinity-RFT,请按照以下步骤操作。
172198

173199
### 第一步:安装
174200

@@ -178,22 +204,26 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
178204
- **CUDA**:版本 >= 12.8
179205
- **GPU**: 至少一块 [compute capability](https://developer.nvidia.com/cuda/gpus) 为 8.0 或更高的 NVIDIA GPU(例如 RTX 30 系列、A100、H100)
180206

181-
## 源码安装(推荐)
207+
**推荐安装方式:**
208+
209+
* 没有 GPU → 使用 Tinker 后端
210+
* 希望快速搭建 → 使用 Docker
211+
* 希望开发和贡献 → 使用 Conda / venv
212+
213+
#### 源码安装(推荐)
182214

183215
如需修改、扩展 Trinity-RFT,推荐使用此方法。
184216

185-
### 1. 克隆仓库
217+
首先,克隆仓库
186218

187219
```bash
188220
git clone https://github.com/agentscope-ai/Trinity-RFT
189221
cd Trinity-RFT
190222
```
191223

192-
### 2. 构建环境
193-
194-
可选择以下任一方式:
224+
然后,通过以下任一方式构建环境:
195225

196-
#### 使用预构建 Docker 镜像(推荐初学者使用该方法)
226+
**使用预构建 Docker 镜像(推荐初学者使用该方法)**
197227

198228

199229
```bash
@@ -211,7 +241,7 @@ docker run -it \
211241

212242
> 该镜像已经通过 `uv` 安装了 Trinity-RFT 以及所有 GPU 相关依赖,且会自动激活虚拟环境(也可通过 `source /opt/venv/bin/activate` 手动激活)。必要时可使用 `uv pip install` 添加额外的包。
213243
214-
#### 使用 Conda
244+
**使用 Conda**
215245

216246
```bash
217247
conda create -n trinity python=3.12
@@ -228,7 +258,7 @@ pip install -e ".[vllm,flash_attn]"
228258
pip install -e ".[dev]" # 用于调试和开发
229259
```
230260

231-
#### 使用 venv
261+
**使用 venv**
232262

233263
```bash
234264
python3.10 -m venv .venv
@@ -245,7 +275,7 @@ pip install -e ".[vllm,flash_attn]"
245275
pip install -e ".[dev]" # 用于调试和开发
246276
```
247277

248-
#### 使用 `uv`
278+
**使用 uv**
249279

250280
[`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
251281

@@ -256,7 +286,7 @@ uv sync --extra vllm --extra dev --extra flash_attn
256286
# uv sync --extra tinker --extra dev
257287
```
258288

259-
## 通过 PyPI 安装
289+
#### 通过 PyPI 安装
260290

261291
如果您只需使用 Trinity-RFT 而不打算修改代码:
262292

@@ -382,12 +412,17 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
382412

383413
## 贡献指南
384414

415+
本项目正处于活跃开发阶段——点击 Star 关注本仓库以获取最新更新!
385416

386-
本项目正处于活跃开发阶段,我们欢迎来自社区的贡献!
417+
我们欢迎来自社区的各种贡献,包括:
387418

419+
* 文档改进
420+
* 工作流、算法和数据处理流水线
421+
* Bug 修复和性能优化
388422

389-
请参阅 [贡献指南](./CONTRIBUTING.md) 了解详情
423+
如果您是项目新手,文档和例子的更新是很好的入手点
390424

425+
详细的贡献指南请参见 [CONTRIBUTING.md](./CONTRIBUTING.md),以及我们的 [good-first-issue 列表](https://github.com/agentscope-ai/Trinity-RFT/issues/470)
391426

392427
## 致谢
393428

@@ -399,7 +434,7 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
399434
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) 用于数据处理流水线;
400435
+ [AgentScope](https://github.com/agentscope-ai/agentscope) 用于智能体工作流;
401436
+ [Ray](https://github.com/ray-project/ray) 用于分布式系统;
402-
+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)[TRL](https://github.com/huggingface/trl)[ChatLearn](https://github.com/alibaba/ChatLearn) 等框架中汲取了灵感;
437+
+ 我们也从 [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF)[TRL](https://github.com/huggingface/trl)[ChatLearn](https://github.com/alibaba/ChatLearn)[rLLM](https://github.com/rllm-org/rllm) 等框架中汲取了灵感;
403438
+ ......
404439

405440
## 引用

docs/sphinx_doc/source/main.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
2727

2828
| Category | Tutorial / Guideline |
2929
| --- | ----|
30-
| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)<br>+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)<br>+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)<br>+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md) |
30+
| *Run diverse RFT modes* | + [Quick start: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)<br>+ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)<br>+ [Fully asynchronous RFT](/tutorial/example_async_mode.md)<br>+ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)<br>+ [RFT without local GPU (Tinker Backend)](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/tinker) |
3131
| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)<br>+ [General multi-step workflow](/tutorial/example_step_wise.md)<br>+ [ReAct workflow with an agent framework](/tutorial/example_react.md) <br>+ [Example: train a web-search agent](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/agentscope_websearch) |
3232
| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)<br>+ [Online task curriculum](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>+ [Research project: learn-to-ask](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) |
3333
| *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: R3L (reflect-then-retry RL)](https://github.com/shiweijiezero/R3L) (📝 [paper](https://arxiv.org/abs/2601.03715))<br>+ [Research project: group-relative REINFORCE](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/agentscope-ai/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
@@ -98,12 +98,12 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
9898

9999
This project is built upon many excellent open-source projects, including:
100100

101-
+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
101+
+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
102102
+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
103103
+ [Data-Juicer](https://github.com/datajuicer/data-juicer) for data processing pipelines;
104104
+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
105105
+ [Ray](https://github.com/ray-project/ray) for distributed systems;
106-
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
106+
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
107107
+ ......
108108

109109

docs/sphinx_doc/source/tutorial/faq.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ ImportError: ...
8080
UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
8181
```
8282

83-
**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. Yoy may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.
83+
**A:** Try to log in to WandB before starting Ray and running the experiment. One way to do this is run the command `export WANDB_API_KEY=[your_api_key]`. You may also try using other monitors instead of WandB by setting `monitor.monitor_type=tensorboard/mlflow`.
8484

8585
---
8686

0 commit comments

Comments
 (0)