diff --git a/README.md b/README.md index ba687f1dcb..2b8449df1e 100644 --- a/README.md +++ b/README.md @@ -73,7 +73,8 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob | *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)
+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) | | *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [Benchmark toolkit for quick verification and experimentation](./benchmark/README.md)
+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | +| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | > [!NOTE] @@ -375,7 +376,7 @@ This project is built upon many excellent open-source projects, including: + [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines; + [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow; + [Ray](https://github.com/ray-project/ray) for distributed systems; -+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn); ++ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm); + ...... diff --git a/README_zh.md b/README_zh.md index ace996d51e..63839442e1 100644 --- a/README_zh.md +++ b/README_zh.md @@ -73,7 +73,8 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: | *多轮智能体强化学习* | + [拼接多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)
+ [通用多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)
+ [调用智能体框架中的 ReAct 工作流](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)
+ [例子:训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)
+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))
+ [研究项目:learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441))
+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [高级数据处理能力 & Human-in-the-loop](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html) | | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))
+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203))
+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)
+ [用于快速验证和实验的 Benchmark 工具](./benchmark/README.md)
+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)
+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) | +| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)
+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)
+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) | > [!NOTE] diff --git a/benchmark/README.md b/benchmark/README.md index 4fb67d06e5..15fcc5ec1e 100644 --- a/benchmark/README.md +++ b/benchmark/README.md @@ -52,6 +52,7 @@ python bench.py gsm8k --node_num 1 --gpu_per_node 8 --model_path /your/model/pat ## 📂 What Gets Saved After running a benchmark, results are stored in `runs//`: + - `config.yaml`: The exact settings used for your run. - `checkpoints/`: Model snapshots saved during training. @@ -60,33 +61,71 @@ After running a benchmark, results are stored in `runs//`: ## 📊 Benchmark Examples ### 1. GSM8K + To reproduce this experiment: + ```bash python bench.py gsm8k --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct ``` + #### GSM8K Results + The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d). ![View Results](../docs/sphinx_doc/assets/gsm8k-bench.png) ### 2. Countdown + To reproduce this experiment: + ```bash python bench.py countdown --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct ``` + #### Countdown Results + The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d). ![View Results](../docs/sphinx_doc/assets/countdown-bench.png) ### 3. Guru-Math + To reproduce this experiment: + ```bash python bench.py guru_math --model_path /path/to/Qwen/Qwen2.5-7B ``` #### Guru Results + The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/fbf6c967bcd637bfd9f81fb4d7dd4961d7d5a407). ![View Results](../docs/sphinx_doc/assets/guru-bench.png) +See [full report](./reports/guru_math.md) for details. + +### 4. FrozenLake + +To reproduce this experiment: + +```bash +python bench.py frozen_lake --model_path /path/to/Qwen/Qwen2.5-3B +``` + +#### Frozen Lake Results + +The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703). +![View Results](../docs/sphinx_doc/assets/bench_frozenlake_step.png) + +See [full report](./reports/frozenlake.md) for details. + +### 5. Alfworld + +Please follow the instructions in [Alfworld report](./reports/alfworld.md) to run the benchmark. + +#### ALFWorld Results + +The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703). +![View Results](../docs/sphinx_doc/assets/bench_alfworld_step.png) + + *More benchmarks will be added soon!* --- diff --git a/docs/sphinx_doc/source/conf.py b/docs/sphinx_doc/source/conf.py index 4fec0d53af..146588ff34 100644 --- a/docs/sphinx_doc/source/conf.py +++ b/docs/sphinx_doc/source/conf.py @@ -90,6 +90,8 @@ def get_recent_tags(n: int) -> list: "article_header_end": "article_header_customized.html", "use_download_button": True, "use_fullscreen_button": True, + "repository_url": "https://github.com/modelscope/Trinity-RFT", + "use_repository_button": True, } html_sidebars = { diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md index 773a5666fc..919eca2908 100644 --- a/docs/sphinx_doc/source/main.md +++ b/docs/sphinx_doc/source/main.md @@ -31,7 +31,9 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob | *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)
+ [General multi-step workflow](/tutorial/example_step_wise.md)
+ [ReAct workflow with an agent framework](/tutorial/example_react.md)
+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)
+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))
+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441))
+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) | | *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))
+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203))
+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *Going deeper into Trinity-RFT* | + [Full configurations](/tutorial/trinity_configs.md)
+ [Benchmark toolkit for quick verification and experimentation](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](/tutorial/synchronizer.md)
+ [How to align configuration with veRL](/tutorial/align_with_verl.md) | +| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html) | + diff --git a/docs/sphinx_doc/source/tutorial/faq.md b/docs/sphinx_doc/source/tutorial/faq.md index ce960acffa..6caec69368 100644 --- a/docs/sphinx_doc/source/tutorial/faq.md +++ b/docs/sphinx_doc/source/tutorial/faq.md @@ -1,12 +1,20 @@ # FAQ ## Part 1: Configurations -**Q:** How do I configure the parameters? -**A:** You can use the config manager to configure the parameters by running `trinity studio --port 8080`. This approach provides a convenient way to configure the parameters. +**Q:** How to write Trinity-RFT configuration files? -Advanced users can also edit the config file directly, referred to the YAML files in `examples`. -Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, which can have massive parameters, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). You may specify these parameters in the `trainer.trainer_config` dictionary. +**A:** The recommended way to write configurations is to use the Trinity Studio. You can launch it with: + +```bash +trinity studio --port 8080 +``` + +This provides an intuitive and user-friendly way to create and modify configuration files. + +For advanced users, we recommend referring to the [configuration documentation](./trinity_configs.md) for detailed explanations of all configuration options. You can also directly edit the YAML configuration files (see the `examples` directory for templates and examples). + +If you are already familiar with veRL, please refer to the [Align with veRL](./align_with_verl.md) tutorial. This guide explains how to align Trinity-RFT configuration parameters with those used in veRL, making it easier to migrate or reuse your existing veRL setups. --- @@ -97,19 +105,45 @@ ray start --head - For trainer, adjust `trainer.max_token_len_per_gpu` when `trainer.use_dynamic_bsz=false`; adjust `trainer.ppo_max_token_len_per_gpu` and `trainer.ulysses_sequence_parallel_size` when `trainer.use_dynamic_bsz=true`. Setting `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` may also help. - For explorer, adjust `explorer.rollout_model.tensor_parallel_size`. +Besides, Trinity-RFT provides [GPU related configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html), which you may refer to for suggestions on adjusting the configurations. ## Part 3: Debugging Methods -Trinity-RFT now supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `/log/`. To see the more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file. + +**Q:** How to find logs for debugging? + +**A:** Trinity-RFT supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `/log/`. + +Some important logs are: + +- `/log/explorer.log`: Logs generated by the explorer process. It contains: + - The begin and end time of each explorer step. + - The metrics generated at each explorer step and evaluation step. + - Model weight synchronization status from the explorer side. + - Workflow exceptions (if any). The workflow running logs are not included here, please check `/log/explorer_runner_.log` for details. + +- `/log/trainer.log`: Logs generated by the trainer process. It contains: + - The begin and end time of each training iteration. + - The metrics generated at each training iteration and evaluation iteration. + - Model weight synchronization status from the trainer side. + +- `/log/explorer_runner_.log`: Logs generated by workflow runner process. It contains: + - The logs printed by each workflow. (Must use the `self.logger` in Workflow to print logs.) + - Exceptions occurred during the workflow running (if any). + +To see more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file. Alternatively, if you want to look at the full logs of all processes and save it to `debug.log`: + ```bash export RAY_DEDUP_LOGS=0 trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log ``` -### Debugging the Workflow +--- + +**Q:** How to debug a workflow without running a full experiment? -To debug a new workflow, use Trinity-RFT's debug mode with the following steps: +**A**: To debug a workflow, use Trinity-RFT's debug mode with the following steps: 1. Launch the inference model via `trinity debug --config --module inference_model` @@ -171,3 +205,15 @@ model = AutoModelForCausalLM.from_pretrained(model_path) ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor") model.load_state_dict(load_fsdp_state_dict_from_verl_checkpoint(ckp_path)) ``` + +--- + +**Q:** What's the difference between Trinity-RFT and veRL? + +**A:** Trinity-RFT uses veRL as the trainer backend, and extends it with a more modular and flexible architecture. The main differences include: + +- **Modular Algorithm Module**: Trinity-RFT extracts algorithm-related components (e.g., advantage function, loss function) from veRL's `core_algos.py` into independent modules, allowing users to easily implement and register new algorithms without modifying the core codebase. +- **Separation of Explorer and Trainer**: Trinity-RFT replaces the rollout model in veRL with a separate Explorer module, which handles agent-environment interactions. This separation allows for more flexible workflow designs and rollout-training scheduling. +- **Full-lifecycle Data Pipeline**: Trinity-RFT adds a Buffer module between Explorer and Trainer, providing a complete data pipeline for experience storage, processing, and sampling. This design enables advanced data handling strategies, such as experience replay and prioritized sampling. + +We also provide benchmarks comparing Trinity-RFT with veRL and systems built on veRL (e.g., [rLLM](https://github.com/rllm-org/rllm)), which show comparable or better performance and efficiency. Please refer to [Benchmark](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark) for more details. diff --git a/docs/sphinx_doc/source_zh/conf.py b/docs/sphinx_doc/source_zh/conf.py index 22f966eb67..68392da766 100644 --- a/docs/sphinx_doc/source_zh/conf.py +++ b/docs/sphinx_doc/source_zh/conf.py @@ -77,6 +77,8 @@ "article_header_end": "article_header_customized.html", "use_download_button": True, "use_fullscreen_button": True, + "repository_url": "https://github.com/modelscope/Trinity-RFT", + "use_repository_button": True, } html_sidebars = { diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md index ff084e01ae..3d1c673e25 100644 --- a/docs/sphinx_doc/source_zh/main.md +++ b/docs/sphinx_doc/source_zh/main.md @@ -30,8 +30,8 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能: | *多轮智能体强化学习* | + [拼接多轮任务](/tutorial/example_multi_turn.md)
+ [通用多轮任务](/tutorial/example_step_wise.md)
+ [调用智能体框架中的 ReAct 工作流](/tutorial/example_react.md)
+ [例子:训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) | | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](/tutorial/develop_selector.md)
+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))
+ [研究项目:learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441))
+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)
+ [高级数据处理能力 & Human-in-the-loop](/tutorial/example_data_functionalities.md) | | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](/tutorial/example_mix_algo.md) (📝 [论文](https://arxiv.org/pdf/2508.11408))
+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203))
+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) | -| *深入认识 Trinity-RFT* | + [完整配置指南](/tutorial/trinity_configs.md)
+ [用于快速验证和实验的 Benchmark 工具](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [理解 explorer-trainer 同步逻辑](/tutorial/synchronizer.md)
+ [如何和 veRL 对齐配置](/tutorial/align_with_verl.md) | - +| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)
+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)
+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)
+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) | +| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)
+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)
+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)
+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html) | ## 🌟 核心特性 diff --git a/docs/sphinx_doc/source_zh/tutorial/faq.md b/docs/sphinx_doc/source_zh/tutorial/faq.md index facd99539c..a04e6b5c66 100644 --- a/docs/sphinx_doc/source_zh/tutorial/faq.md +++ b/docs/sphinx_doc/source_zh/tutorial/faq.md @@ -1,16 +1,24 @@ -# 常见问题 +# 常见问题(FAQ) ## 第一部分:参数配置 -**Q:** 在哪里配置参数? -**A:** 你可以通过运行 `trinity studio --port 8080` 使用配置管理器来配置参数。这种方式提供了便捷的参数配置途径。 +**Q:** 如何编写 Trinity-RFT 的配置文件? -高级用户也可以直接编辑配置文件,参见各例子(`examples`)中的 YAML 文件。 -Trinity-RFT 使用 [veRL](https://github.com/volcengine/verl) 作为训练后端,其参数数量较多,详见 [veRL 文档](https://verl.readthedocs.io/en/latest/examples/config.html)。你可以在 `trainer.trainer_config` 字典中指定这些参数。 +**A:** 推荐使用 Trinity Studio 进行配置文件编写。你可以通过以下命令启动: + +```bash +trinity studio --port 8080 +``` + +该指令提供一个直观易用的图形化界面来创建和修改配置文件。 + +对于进阶用户,建议参考[配置文档](./trinity_configs.md)以了解所有配置项的详细说明。你也可以直接编辑 YAML 配置文件(可参考 `examples` 目录下的模板和示例)。 + +如果你已经熟悉 veRL,请参考[与 veRL 对齐配置](./align_with_verl.md)。该教程介绍了如何将 Trinity-RFT 的配置参数与 veRL 对齐,方便迁移或复用已有的 veRL 配置。 --- -**Q:** `buffer.batch_size`、`buffer.train_batch_size`、`actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` 以及其他 batch size 参数之间有什么关系? +**Q:** `buffer.batch_size`、`buffer.train_batch_size`、`actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` 及其他 batch size 参数之间的关系? **A:** 这些参数的关系如下: @@ -19,7 +27,7 @@ Trinity-RFT 使用 [veRL](https://github.com/volcengine/verl) 作为训练后端 - `actor_rollout_ref.actor.ppo_mini_batch_size`:一个 mini-batch 中的 experience 数量,会被 `buffer.train_batch_size` 覆盖;但在 `update_policy` 函数中,其值表示每个 GPU 上的 mini-batch experience 数量,即 `buffer.train_batch_size (/ ngpus_trainer)`。除以 `ngpus_trainer` 是由于数据隐式分配到多个 GPU 上所致,但在梯度累积后不影响最终结果。 - `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`:每个 GPU 上 micro-batch 中的 experience 数量。 -以下例子简要地展示了它们的用法: +一个简要示例: ```python def update_policy(batch_exps): @@ -62,7 +70,7 @@ File ".../flash_attn/flash_attn_interface.py", line 15, in ‹module> ImportError: ... ``` -**A:** `flash-attn` 模块未正确安装。尝试运行 `pip install flash-attn==2.8.1` 或 `pip install flash-attn==2.8.1 -v --no-build-isolation` 来修复。 +**A:** `flash-attn` 模块未正确安装。请尝试运行 `pip install flash-attn==2.8.1` 或 `pip install flash-attn==2.8.1 -v --no-build-isolation` 进行修复。 --- @@ -71,7 +79,7 @@ ImportError: ... UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ... ``` -**A:** 如果你使用 WandB 来观察实验,在启动 Ray 和运行实验之前,请先登录 WandB。一种方法是执行命令 `export WANDB_API_KEY=[your_api_key]`。你也可以选择其他方式来观察实验,比如设置 `monitor.monitor_type=tensorboard/mlflow`。 +**A:** 请在启动 Ray 和运行实验前先登录 WandB。可以通过 `export WANDB_API_KEY=[your_api_key]` 设置环境变量。你也可以通过设置 `monitor.monitor_type=tensorboard/mlflow` 使用其他监控方式。 --- @@ -80,7 +88,7 @@ UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key] ValueError: Failed to look up actor with name 'explorer' ... ``` -**A:** 确保在运行实验前已启动 Ray。如果 Ray 已在运行,可通过以下命令重启: +**A:** 请确保在运行实验前已启动 Ray。如果 Ray 已在运行,可通过以下命令重启: ```bash ray stop @@ -89,41 +97,60 @@ ray start --head --- -**报错:** 内存不足 (OOM) 错误 +**报错:** 内存不足(OOM)错误 -**A:** 以下参数可能有所帮助: +**A:** 可尝试调整以下参数: -- 对于 trainer:当 `trainer.use_dynamic_bsz=false` 时,调整 `trainer.max_token_len_per_gpu`;当 `trainer.use_dynamic_bsz=true` 时,调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。 -- 对于 explorer:调整 `explorer.rollout_model.tensor_parallel_size`。 +- 对于 trainer,当 `trainer.use_dynamic_bsz=false` 时,调整 `trainer.max_token_len_per_gpu`;当 `trainer.use_dynamic_bsz=true` 时,调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。 +- 对于 explorer,调整 `explorer.rollout_model.tensor_parallel_size`。 + +此外,Trinity-RFT 提供了[GPU 相关配置指南](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html),可参考其中建议。 ## 第三部分:调试方法 -Trinity-RFT 现在支持 actor 级别的日志功能,可自动将每个 actor(例如 explorer 和 trainer)的日志保存到 `/log/` 目录下。如需查看更详细的日志信息,可通过在配置文件中设置 `log.level=debug`,将默认日志级别(`info`)更改为 `debug`。 -你也可以查看所有进程的完整日志并保存到 `debug.log`: +**Q:** Trinity-RFT 运行日志在哪查看? + +**A:** Trinity-RFT 支持 actor 级别日志,会自动将每个 actor(如 explorer 和 trainer)的日志保存到 `/log/`。 + +常见日志包括: + +- `/log/explorer.log`:explorer 进程日志,包括每步起止时间、评测指标、模型权重同步、异常等。 +- `/log/trainer.log`:trainer 进程日志,包括每次训练迭代的起止时间、评测指标、模型权重同步等。 +- `/log/explorer_runner_.log`:workflow runner 进程日志,包括 workflow 打印的日志和运行异常(需在 Workflow 中用 `self.logger` 打印日志)。 + +如需更详细日志,可在配置文件中设置 `log.level=debug`。 + +如需查看所有进程的完整日志并保存到 `debug.log`: ```bash export RAY_DEDUP_LOGS=0 trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log ``` -### 调试工作流(Workflow) - - -实现新工作流后,可使用 Trinity-RFT 的调试模式进行调试,步骤如下: +--- -1. 启动推理模型: `trinity debug --config --module inference_model` +**Q:** 如何在不运行完整实验的情况下调试 workflow? -2. 在另一个终端中进行工作流的调试:`trinity debug --config --module workflow --output-file --plugin-dir ` +**A:** 可用 Trinity-RFT 的 debug 模式,步骤如下: -更多详细信息,请参阅{ref}`工作流开发指南 `章节。 +1. 启动推理模型: + ```bash + trinity debug --config --module inference_model + ``` +2. 在另一个终端调试 workflow: + ```bash + trinity debug --config --module workflow --output-file --plugin-dir + ``` +详细说明见 {ref}`工作流开发指南 `。 ## 第四部分:其他问题 -**Q:** `buffer.trainer_input.experience_buffer.path` 的作用是什么? -**A:** 该路径指定了用于持久化存储生成的 experience 的 SQLite 数据库路径。如果你不想使用 SQLite 数据库,可以注释掉这一行。 +**Q:** `buffer.trainer_input.experience_buffer.path` 有什么作用? + +**A:** 该路径指定用于存储生成 experience 的 SQLite 数据库路径。如果不需要,可注释掉该行。 -要查看数据库中的 experience,可以使用以下 Python 脚本: +如需查看数据库中的 experience,可用如下 Python 脚本: ```python from sqlalchemy import create_engine @@ -147,14 +174,14 @@ exp_list = [] for exp in experiences: exp_list.append(ExperienceModel.to_experience(exp)) -# 打印 experience 信息 +# 打印 experience for exp in exp_list: print(f"{exp.prompt_text=}", f"{exp.response_text=}") ``` --- -**Q:** 如何在 Trinity-RFT 框架之外加载检查点(checkpoints)? +**Q:** 如何在 Trinity-RFT 框架外加载 checkpoints? **A:** 你需要指定模型路径和检查点路径。以下代码片段展示了如何使用 transformers 库进行加载。 @@ -171,3 +198,15 @@ model = AutoModelForCausalLM.from_pretrained(model_path) ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor") model.load_state_dict(load_fsdp_state_dict_from_verl_checkpoint(ckp_path)) ``` + +--- + +**Q:** Trinity-RFT 和 veRL 有什么区别? + +**A:** Trinity-RFT 以 veRL 作为 trainer 后端,并在其基础上扩展了更模块化和灵活的架构。主要区别包括: + +- **模块化算法**:Trinity-RFT 将 veRL `core_algos.py` 中的算法相关组件(如优势函数、损失函数)提取为独立模块,便于用户实现和注册新算法,无需修改核心代码。 +- **Explorer 与 Trainer 分离**:Trinity-RFT 用独立 Explorer 模块替代 veRL 的 rollout model,专门负责 agent 与环境交互,支持更灵活的 workflow 设计和 rollout-training 调度。 +- **全生命周期数据通路**:Trinity-RFT 在 Explorer 和 Trainer 之间增加 Buffer 模块,提供完整的数据存储、处理和采样通路,支持经验回放、优先采样等高级数据处理策略。 + +我们还提供了 Trinity-RFT 与 veRL 及其衍生系统(如 [rLLM](https://github.com/rllm-org/rllm))的基准对比,详见 [Benchmark](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark)。