Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html) <br>+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374)) <br>+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
| *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [Benchmark toolkit for quick verification and experimentation](./benchmark/README.md)<br>+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |
| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |


> [!NOTE]
Expand Down Expand Up @@ -375,7 +376,7 @@ This project is built upon many excellent open-source projects, including:
+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
+ [Ray](https://github.com/ray-project/ray) for distributed systems;
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
+ ......


Expand Down
3 changes: 2 additions & 1 deletion README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,8 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
| *多轮智能体强化学习* | + [拼接多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)<br>+ [通用多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)<br>+ [调用智能体框架中的 ReAct 工作流](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html) <br>+ [例子:训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)<br>+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目:learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 & Human-in-the-loop](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) |
| *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [用于快速验证和实验的 Benchmark 工具](./benchmark/README.md)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |
| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) |


> [!NOTE]
Expand Down
44 changes: 44 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ python bench.py gsm8k --node_num 1 --gpu_per_node 8 --model_path /your/model/pat
## 📂 What Gets Saved

After running a benchmark, results are stored in `runs/<timestamp>/`:

- `config.yaml`: The exact settings used for your run.
- `checkpoints/`: Model snapshots saved during training.

Expand All @@ -60,33 +61,76 @@ After running a benchmark, results are stored in `runs/<timestamp>/`:
## 📊 Benchmark Examples

### 1. GSM8K

To reproduce this experiment:

```bash
python bench.py gsm8k --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct
```

#### GSM8K Results

The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d).
![View Results](../docs/sphinx_doc/assets/gsm8k-bench.png)

### 2. Countdown

To reproduce this experiment:

```bash
python bench.py countdown --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct
```

#### Countdown Results

The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d).
![View Results](../docs/sphinx_doc/assets/countdown-bench.png)

### 3. Guru-Math

To reproduce this experiment:

```bash
python bench.py guru_math --model_path /path/to/Qwen/Qwen2.5-7B
```

#### Guru Results

The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/fbf6c967bcd637bfd9f81fb4d7dd4961d7d5a407).
![View Results](../docs/sphinx_doc/assets/guru-bench.png)

See [full report](./reports/guru_math.md) for details.

### 4. FrozenLake

To reproduce this experiment:

```bash
python bench.py frozen_lake --model_path /path/to/Qwen/Qwen2.5-3B
```

#### Frozen Lake Results

The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703).
![View Results](../docs/sphinx_doc/assets/bench_frozenlake_step.png)

See [full report](./reports/frozenlake.md) for details.

### 5. Alfworld

To reproduce this experiment:

```bash
python bench.py alfworld --model_path /path/to/Qwen/Qwen2.5-3B
```

#### ALFWorld Results

The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703).
![View Results](../docs/sphinx_doc/assets/bench_alfworld_step.png)

See [full report](./reports/alfworld.md) for details.

*More benchmarks will be added soon!*

---
Expand Down
4 changes: 3 additions & 1 deletion docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,9 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
| *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)<br>+ [General multi-step workflow](/tutorial/example_step_wise.md)<br>+ [ReAct workflow with an agent framework](/tutorial/example_react.md) <br>+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
| *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)<br>+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) |
| *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
| *Going deeper into Trinity-RFT* | + [Full configurations](/tutorial/trinity_configs.md)<br>+ [Benchmark toolkit for quick verification and experimentation](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](/tutorial/synchronizer.md)<br>+ [How to align configuration with veRL](/tutorial/align_with_verl.md) |
| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) |




Expand Down
60 changes: 53 additions & 7 deletions docs/sphinx_doc/source/tutorial/faq.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,20 @@
# FAQ

## Part 1: Configurations
**Q:** How do I configure the parameters?

**A:** You can use the config manager to configure the parameters by running `trinity studio --port 8080`. This approach provides a convenient way to configure the parameters.
**Q:** How to write Trinity-RFT configuration files?

Advanced users can also edit the config file directly, referred to the YAML files in `examples`.
Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, which can have massive parameters, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). You may specify these parameters in the `trainer.trainer_config` dictionary.
**A:** The recommended way to write configurations is to use the Trinity Studio. You can launch it with:

```bash
trinity studio --port 8080
```

This provides an intuitive and user-friendly way to create and modify configuration files.

For advanced users, we recommend referring to the [configuration documentation](./trinity_configs.md) for detailed explanations of all configuration options. You can also directly edit the YAML configuration files (see the `examples` directory for templates and examples).

If you are already familiar with veRL, please refer to the [Align with veRL](./align_with_verl.md) tutorial. This guide explains how to align Trinity-RFT configuration parameters with those used in veRL, making it easier to migrate or reuse your existing veRL setups.

---

Expand Down Expand Up @@ -97,19 +105,45 @@ ray start --head
- For trainer, adjust `trainer.max_token_len_per_gpu` when `trainer.use_dynamic_bsz=false`; adjust `trainer.ppo_max_token_len_per_gpu` and `trainer.ulysses_sequence_parallel_size` when `trainer.use_dynamic_bsz=true`. Setting `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` may also help.
- For explorer, adjust `explorer.rollout_model.tensor_parallel_size`.

Besides, Trinity-RFT provides [GPU related configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html), which you may refer to for suggestions on adjusting the configurations.

## Part 3: Debugging Methods
Trinity-RFT now supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `<checkpoint_job_dir>/log/<actor_name>`. To see the more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file.

**Q:** How to find logs for debugging?

**A:** Trinity-RFT supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `<checkpoint_job_dir>/log/<actor_name>`.

Some important logs are:

- `<checkpoint_job_dir>/log/explorer.log`: Logs generated by the explorer process. It contains:
- The begin and end time of each explorer step.
- The metrics generated of each explorer step and evaluation step.
- Model weight synchronization status from the explorer side.
- Workflow exceptions (if any). The workflow running logs are not included here, please check `<checkpoint_job_dir>/log/explorer_runner_<n>.log` for details.

- `<checkpoint_job_dir>/log/trainer.log`: Logs generated by the trainer process. It contains:
- The begin and end time of each training iteration.
- The metrics generated of each training iteration and evaluation iteration.
- Model weight synchronization status from the trainer side.

- `<checkpoint_job_dir>/log/explorer_runner_<n>.log`: Logs generated by workflow runner process. It contains:
- The logs printed by each workflow. (Must use the `self.logger` in Workflow to print logs.)
- Exceptions occurred during the workflow running (if any).

To see more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file.

Alternatively, if you want to look at the full logs of all processes and save it to `debug.log`:

```bash
export RAY_DEDUP_LOGS=0
trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
```

### Debugging the Workflow
---

**Q:** How to debug a workflow without running a full experiment?

To debug a new workflow, use Trinity-RFT's debug mode with the following steps:
**A**: To debug a workflow, use Trinity-RFT's debug mode with the following steps:

1. Launch the inference model via `trinity debug --config <config_file_path> --module inference_model`

Expand Down Expand Up @@ -171,3 +205,15 @@ model = AutoModelForCausalLM.from_pretrained(model_path)
ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor")
model.load_state_dict(load_fsdp_state_dict_from_verl_checkpoint(ckp_path))
```

---

**Q:** What's the difference between Trinity-RFT and veRL?

**A:** Trinity-RFT uses veRL as the trainer backend, and extends it with a more modular and flexible architecture. The main differences include:

- **Modular Algorithm Module**: Trinity-RFT extracts algorithm-related components (e.g., advantage function, loss function) from veRL's `core_algos.py` into independent modules, allowing users to easily implement and register new algorithms without modifying the core codebase.
- **Separation of Explorer and Trainer**: Trinity-RFT replaces the rollout model in veRL with a separate Explorer module, which handles agent-environment interactions. This separation allows for more flexible workflow designs and supports flexible RFT modes such as one-step off-policy.
- **Full-lifecycle Data Pipeline**: Trinity-RFT adds a Buffer module between Explorer and Trainer, providing a complete data pipeline for experience storage, processing, and sampling. This design enables advanced data handling strategies, such as experience replay and prioritized sampling.

We also provide benchmarks comparing Trinity-RFT with veRL and systems built on veRL (e.g., [rLLM](https://github.com/rllm-org/rllm)), which show comparable or better performance and efficiency. Please refer to [Benchmark](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark) for more details.
Loading