diff --git a/README.md b/README.md
index ba687f1dcb..2b8449df1e 100644
--- a/README.md
+++ b/README.md
@@ -73,7 +73,8 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 | *Multi-step agentic RL* | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)  <br>+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
 | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374)) <br>+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)  |
 | *Algorithm development* | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [Benchmark toolkit for quick verification and experimentation](./benchmark/README.md)<br>+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html)    |
+| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html)    |
 
 
 > [!NOTE]
@@ -375,7 +376,7 @@ This project is built upon many excellent open-source projects, including:
 + [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
 + [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
 + [Ray](https://github.com/ray-project/ray) for distributed systems;
-+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
++ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl), [ChatLearn](https://github.com/alibaba/ChatLearn) and [rLLM](https://github.com/rllm-org/rllm);
 + ......
 
 
diff --git a/README_zh.md b/README_zh.md
index ace996d51e..63839442e1 100644
--- a/README_zh.md
+++ b/README_zh.md
@@ -73,7 +73,8 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 | *多轮智能体强化学习* | + [拼接多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_multi_turn.html)<br>+ [通用多轮任务](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_step_wise.html)<br>+ [调用智能体框架中的 ReAct 工作流](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_react.html)  <br>+ [例子：训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
 | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/develop_selector.html)<br>+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目：learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 &  Human-in-the-loop](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_data_functionalities.html)  |
 | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/example_mix_algo.html) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [用于快速验证和实验的 Benchmark 工具](./benchmark/README.md)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
+| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
 
 
 > [!NOTE]
diff --git a/benchmark/README.md b/benchmark/README.md
index 4fb67d06e5..15fcc5ec1e 100644
--- a/benchmark/README.md
+++ b/benchmark/README.md
@@ -52,6 +52,7 @@ python bench.py gsm8k --node_num 1 --gpu_per_node 8 --model_path /your/model/pat
 ## 📂 What Gets Saved
 
 After running a benchmark, results are stored in `runs/<timestamp>/`:
+
 - `config.yaml`: The exact settings used for your run.
 - `checkpoints/`: Model snapshots saved during training.
 
@@ -60,33 +61,71 @@ After running a benchmark, results are stored in `runs/<timestamp>/`:
 ## 📊 Benchmark Examples
 
 ### 1. GSM8K
+
 To reproduce this experiment:
+
 ```bash
 python bench.py gsm8k --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct
 ```
+
 #### GSM8K Results
+
 The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d).
 ![View Results](../docs/sphinx_doc/assets/gsm8k-bench.png)
 
 ### 2. Countdown
+
 To reproduce this experiment:
+
 ```bash
 python bench.py countdown --model_path /path/to/Qwen/Qwen2.5-1.5B-Instruct
 ```
+
 #### Countdown Results
+
 The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/068da409d215bb2450d93b6b7a56740d4751669d).
 ![View Results](../docs/sphinx_doc/assets/countdown-bench.png)
 
 ### 3. Guru-Math
+
 To reproduce this experiment:
+
 ```bash
 python bench.py guru_math --model_path /path/to/Qwen/Qwen2.5-7B
 ```
 
 #### Guru Results
+
 The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/fbf6c967bcd637bfd9f81fb4d7dd4961d7d5a407).
 ![View Results](../docs/sphinx_doc/assets/guru-bench.png)
 
+See [full report](./reports/guru_math.md) for details.
+
+### 4. FrozenLake
+
+To reproduce this experiment:
+
+```bash
+python bench.py frozen_lake --model_path /path/to/Qwen/Qwen2.5-3B
+```
+
+#### Frozen Lake Results
+
+The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703).
+![View Results](../docs/sphinx_doc/assets/bench_frozenlake_step.png)
+
+See [full report](./reports/frozenlake.md) for details.
+
+### 5. Alfworld
+
+Please follow the instructions in [Alfworld report](./reports/alfworld.md) to run the benchmark.
+
+#### ALFWorld Results
+
+The chart below shows performance based on this [commit](https://github.com/modelscope/Trinity-RFT/tree/3861859cbd9c40de07429db2d9b19fd3d4d31703).
+![View Results](../docs/sphinx_doc/assets/bench_alfworld_step.png)
+
+
 *More benchmarks will be added soon!*
 
 ---
diff --git a/docs/sphinx_doc/source/conf.py b/docs/sphinx_doc/source/conf.py
index 4fec0d53af..146588ff34 100644
--- a/docs/sphinx_doc/source/conf.py
+++ b/docs/sphinx_doc/source/conf.py
@@ -90,6 +90,8 @@ def get_recent_tags(n: int) -> list:
     "article_header_end": "article_header_customized.html",
     "use_download_button": True,
     "use_fullscreen_button": True,
+    "repository_url": "https://github.com/modelscope/Trinity-RFT",
+    "use_repository_button": True,
 }
 
 html_sidebars = {
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index 773a5666fc..919eca2908 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -31,7 +31,9 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 | *Multi-step agentic RL* | + [Concatenated multi-turn workflow](/tutorial/example_multi_turn.md)<br>+ [General multi-step workflow](/tutorial/example_step_wise.md)<br>+ [ReAct workflow with an agent framework](/tutorial/example_react.md)  <br>+ [Example: train a web-search agent](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
 | *Full-lifecycle data pipelines* | + [Rollout task mixing and selection](/tutorial/develop_selector.md)<br>+ [Online task curriculum](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [paper](https://arxiv.org/pdf/2510.26374))<br>+ [Research project: learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [paper](https://arxiv.org/pdf/2510.25441)) <br>+ [Experience replay with prioritization](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md)  |
 | *Algorithm development* | + [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) (📝 [paper](https://arxiv.org/pdf/2508.11408))<br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [paper](https://arxiv.org/abs/2509.24203)) <br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *Going deeper into Trinity-RFT* | + [Full configurations](/tutorial/trinity_configs.md)<br>+ [Benchmark toolkit for quick verification and experimentation](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [GPU Resource and Training Configuration Guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](/tutorial/synchronizer.md)<br>+ [How to align configuration with veRL](/tutorial/align_with_verl.md)    |
+| *Benchmarks* | + [Benchmark toolkit (quick verification & experimentation)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math benchmark & comparison with veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld benchmark & comparison with rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *Going deeper into Trinity-RFT* | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [GPU resource and training configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)<br>+ [How to align configuration with veRL](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/align_with_verl.html)    |
+
 
 
 
diff --git a/docs/sphinx_doc/source/tutorial/faq.md b/docs/sphinx_doc/source/tutorial/faq.md
index ce960acffa..6caec69368 100644
--- a/docs/sphinx_doc/source/tutorial/faq.md
+++ b/docs/sphinx_doc/source/tutorial/faq.md
@@ -1,12 +1,20 @@
 # FAQ
 
 ## Part 1: Configurations
-**Q:** How do I configure the parameters?
 
-**A:** You can use the config manager to configure the parameters by running `trinity studio --port 8080`. This approach provides a convenient way to configure the parameters.
+**Q:** How to write Trinity-RFT configuration files?
 
-Advanced users can also edit the config file directly, referred to the YAML files in `examples`.
-Trinity-RFT uses [veRL](https://github.com/volcengine/verl) as the training backend, which can have massive parameters, referred to [veRL documentation](https://verl.readthedocs.io/en/latest/examples/config.html). You may specify these parameters in the `trainer.trainer_config` dictionary.
+**A:** The recommended way to write configurations is to use the Trinity Studio. You can launch it with:
+
+```bash
+trinity studio --port 8080
+```
+
+This provides an intuitive and user-friendly way to create and modify configuration files.
+
+For advanced users, we recommend referring to the [configuration documentation](./trinity_configs.md) for detailed explanations of all configuration options. You can also directly edit the YAML configuration files (see the `examples` directory for templates and examples).
+
+If you are already familiar with veRL, please refer to the [Align with veRL](./align_with_verl.md) tutorial. This guide explains how to align Trinity-RFT configuration parameters with those used in veRL, making it easier to migrate or reuse your existing veRL setups.
 
 ---
 
@@ -97,19 +105,45 @@ ray start --head
 - For trainer, adjust `trainer.max_token_len_per_gpu` when `trainer.use_dynamic_bsz=false`; adjust `trainer.ppo_max_token_len_per_gpu` and `trainer.ulysses_sequence_parallel_size` when `trainer.use_dynamic_bsz=true`. Setting `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` may also help.
 - For explorer, adjust `explorer.rollout_model.tensor_parallel_size`.
 
+Besides, Trinity-RFT provides [GPU related configuration guide](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html), which you may refer to for suggestions on adjusting the configurations.
 
 ## Part 3: Debugging Methods
-Trinity-RFT now supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `<checkpoint_job_dir>/log/<actor_name>`. To see the more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file.
+
+**Q:** How to find logs for debugging?
+
+**A:** Trinity-RFT supports the actor-level logs, which automatically saves the logs for each actor (such as explorer and trainer) to `<checkpoint_job_dir>/log/<actor_name>`.
+
+Some important logs are:
+
+- `<checkpoint_job_dir>/log/explorer.log`: Logs generated by the explorer process. It contains:
+    - The begin and end time of each explorer step.
+    - The metrics generated at each explorer step and evaluation step.
+    - Model weight synchronization status from the explorer side.
+    - Workflow exceptions (if any). The workflow running logs are not included here, please check `<checkpoint_job_dir>/log/explorer_runner_<n>.log` for details.
+
+- `<checkpoint_job_dir>/log/trainer.log`: Logs generated by the trainer process. It contains:
+    - The begin and end time of each training iteration.
+    - The metrics generated at each training iteration and evaluation iteration.
+    - Model weight synchronization status from the trainer side.
+
+- `<checkpoint_job_dir>/log/explorer_runner_<n>.log`: Logs generated by workflow runner process. It contains:
+    - The logs printed by each workflow. (Must use the `self.logger` in Workflow to print logs.)
+    - Exceptions occurred during the workflow running (if any).
+
+To see more detailed logs, change the default log level (`info`) to `debug`, by setting `log.level=debug` in config file.
 
 Alternatively, if you want to look at the full logs of all processes and save it to `debug.log`:
+
 ```bash
 export RAY_DEDUP_LOGS=0
 trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
 ```
 
-### Debugging the Workflow
+---
+
+**Q:** How to debug a workflow without running a full experiment?
 
-To debug a new workflow, use Trinity-RFT's debug mode with the following steps:
+**A**: To debug a workflow, use Trinity-RFT's debug mode with the following steps:
 
 1. Launch the inference model via `trinity debug --config <config_file_path> --module inference_model`
 
@@ -171,3 +205,15 @@ model = AutoModelForCausalLM.from_pretrained(model_path)
 ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor")
 model.load_state_dict(load_fsdp_state_dict_from_verl_checkpoint(ckp_path))
 ```
+
+---
+
+**Q:** What's the difference between Trinity-RFT and veRL?
+
+**A:** Trinity-RFT uses veRL as the trainer backend, and extends it with a more modular and flexible architecture. The main differences include:
+
+- **Modular Algorithm Module**: Trinity-RFT extracts algorithm-related components (e.g., advantage function, loss function) from veRL's `core_algos.py` into independent modules, allowing users to easily implement and register new algorithms without modifying the core codebase.
+- **Separation of Explorer and Trainer**: Trinity-RFT replaces the rollout model in veRL with a separate Explorer module, which handles agent-environment interactions. This separation allows for more flexible workflow designs and rollout-training scheduling.
+- **Full-lifecycle Data Pipeline**: Trinity-RFT adds a Buffer module between Explorer and Trainer, providing a complete data pipeline for experience storage, processing, and sampling. This design enables advanced data handling strategies, such as experience replay and prioritized sampling.
+
+We also provide benchmarks comparing Trinity-RFT with veRL and systems built on veRL (e.g., [rLLM](https://github.com/rllm-org/rllm)), which show comparable or better performance and efficiency. Please refer to [Benchmark](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark) for more details.
diff --git a/docs/sphinx_doc/source_zh/conf.py b/docs/sphinx_doc/source_zh/conf.py
index 22f966eb67..68392da766 100644
--- a/docs/sphinx_doc/source_zh/conf.py
+++ b/docs/sphinx_doc/source_zh/conf.py
@@ -77,6 +77,8 @@
     "article_header_end": "article_header_customized.html",
     "use_download_button": True,
     "use_fullscreen_button": True,
+    "repository_url": "https://github.com/modelscope/Trinity-RFT",
+    "use_repository_button": True,
 }
 
 html_sidebars = {
diff --git a/docs/sphinx_doc/source_zh/main.md b/docs/sphinx_doc/source_zh/main.md
index ff084e01ae..3d1c673e25 100644
--- a/docs/sphinx_doc/source_zh/main.md
+++ b/docs/sphinx_doc/source_zh/main.md
@@ -30,8 +30,8 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 | *多轮智能体强化学习* | + [拼接多轮任务](/tutorial/example_multi_turn.md)<br>+ [通用多轮任务](/tutorial/example_step_wise.md)<br>+ [调用智能体框架中的 ReAct 工作流](/tutorial/example_react.md)  <br>+ [例子：训练一个网络搜索智能体](https://github.com/modelscope/Trinity-RFT/tree/main/examples/agentscope_websearch) |
 | *全生命周期的数据流水线* | + [Rollout 任务混合与选取](/tutorial/develop_selector.md)<br>+ [在线任务选择](https://github.com/modelscope/Trinity-RFT/tree/main/examples/bots) (📝 [论文](https://arxiv.org/pdf/2510.26374))<br>+ [研究项目：learn-to-ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask) (📝 [论文](https://arxiv.org/pdf/2510.25441)) <br>+ [经验回放机制](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [高级数据处理能力 &  Human-in-the-loop](/tutorial/example_data_functionalities.md)  |
 | *强化学习算法开发* | + [使用 Trinity-RFT 进行 RL 算法开发](/tutorial/example_mix_algo.md) (📝 [论文](https://arxiv.org/pdf/2508.11408))<br>+ [研究项目: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) (📝 [论文](https://arxiv.org/abs/2509.24203)) <br>+ 不可验证的领域: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [可训练 RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) |
-| *深入认识 Trinity-RFT* | + [完整配置指南](/tutorial/trinity_configs.md)<br>+ [用于快速验证和实验的 Benchmark 工具](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](/tutorial/synchronizer.md)<br>+ [如何和 veRL 对齐配置](/tutorial/align_with_verl.md)   |
-
+| *基准测试* | + [基准测试工具 (快速验证与实验)](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/README.md)<br>+ [Guru-Math 测试 & 对比 veRL](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/guru_math.md)<br>+ [FrozenLake 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/frozenlake.md)<br>+ [Alfworld 测试 & 对比 rLLM](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark/reports/alfworld.md) |
+| *深入认识 Trinity-RFT* | + [完整配置指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_configs.html)<br>+ [GPU 资源与训练配置对应指南](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/trinity_gpu_configs.html)<br>+ [理解 explorer-trainer 同步逻辑](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/synchronizer.html)<br>+ [如何与 verl 对齐配置](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/align_with_verl.html)   |
 
 
 ## 🌟 核心特性
diff --git a/docs/sphinx_doc/source_zh/tutorial/faq.md b/docs/sphinx_doc/source_zh/tutorial/faq.md
index facd99539c..a04e6b5c66 100644
--- a/docs/sphinx_doc/source_zh/tutorial/faq.md
+++ b/docs/sphinx_doc/source_zh/tutorial/faq.md
@@ -1,16 +1,24 @@
-# 常见问题
+# 常见问题（FAQ）
 
 ## 第一部分：参数配置
-**Q:** 在哪里配置参数？
 
-**A:** 你可以通过运行 `trinity studio --port 8080` 使用配置管理器来配置参数。这种方式提供了便捷的参数配置途径。
+**Q:** 如何编写 Trinity-RFT 的配置文件？
 
-高级用户也可以直接编辑配置文件，参见各例子（`examples`）中的 YAML 文件。
-Trinity-RFT 使用 [veRL](https://github.com/volcengine/verl) 作为训练后端，其参数数量较多，详见 [veRL 文档](https://verl.readthedocs.io/en/latest/examples/config.html)。你可以在 `trainer.trainer_config` 字典中指定这些参数。
+**A:** 推荐使用 Trinity Studio 进行配置文件编写。你可以通过以下命令启动：
+
+```bash
+trinity studio --port 8080
+```
+
+该指令提供一个直观易用的图形化界面来创建和修改配置文件。
+
+对于进阶用户，建议参考[配置文档](./trinity_configs.md)以了解所有配置项的详细说明。你也可以直接编辑 YAML 配置文件（可参考 `examples` 目录下的模板和示例）。
+
+如果你已经熟悉 veRL，请参考[与 veRL 对齐配置](./align_with_verl.md)。该教程介绍了如何将 Trinity-RFT 的配置参数与 veRL 对齐，方便迁移或复用已有的 veRL 配置。
 
 ---
 
-**Q:** `buffer.batch_size`、`buffer.train_batch_size`、`actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` 以及其他 batch size 参数之间有什么关系？
+**Q:** `buffer.batch_size`、`buffer.train_batch_size`、`actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu` 及其他 batch size 参数之间的关系？
 
 **A:** 这些参数的关系如下：
 
@@ -19,7 +27,7 @@ Trinity-RFT 使用 [veRL](https://github.com/volcengine/verl) 作为训练后端
 - `actor_rollout_ref.actor.ppo_mini_batch_size`：一个 mini-batch 中的 experience 数量，会被 `buffer.train_batch_size` 覆盖；但在 `update_policy` 函数中，其值表示每个 GPU 上的 mini-batch experience 数量，即 `buffer.train_batch_size (/ ngpus_trainer)`。除以 `ngpus_trainer` 是由于数据隐式分配到多个 GPU 上所致，但在梯度累积后不影响最终结果。
 - `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`：每个 GPU 上 micro-batch 中的 experience 数量。
 
-以下例子简要地展示了它们的用法：
+一个简要示例：
 
 ```python
 def update_policy(batch_exps):
@@ -62,7 +70,7 @@ File ".../flash_attn/flash_attn_interface.py", line 15, in ‹module>
 ImportError: ...
 ```
 
-**A:** `flash-attn` 模块未正确安装。尝试运行 `pip install flash-attn==2.8.1` 或 `pip install flash-attn==2.8.1 -v --no-build-isolation` 来修复。
+**A:** `flash-attn` 模块未正确安装。请尝试运行 `pip install flash-attn==2.8.1` 或 `pip install flash-attn==2.8.1 -v --no-build-isolation` 进行修复。
 
 ---
 
@@ -71,7 +79,7 @@ ImportError: ...
 UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]) ...
 ```
 
-**A:** 如果你使用 WandB 来观察实验，在启动 Ray 和运行实验之前，请先登录 WandB。一种方法是执行命令 `export WANDB_API_KEY=[your_api_key]`。你也可以选择其他方式来观察实验，比如设置 `monitor.monitor_type=tensorboard/mlflow`。
+**A:** 请在启动 Ray 和运行实验前先登录 WandB。可以通过 `export WANDB_API_KEY=[your_api_key]` 设置环境变量。你也可以通过设置 `monitor.monitor_type=tensorboard/mlflow` 使用其他监控方式。
 
 ---
 
@@ -80,7 +88,7 @@ UsageError: api_key not configured (no-tty). call wandb.login(key=[your_api_key]
 ValueError: Failed to look up actor with name 'explorer' ...
 ```
 
-**A:** 确保在运行实验前已启动 Ray。如果 Ray 已在运行，可通过以下命令重启：
+**A:** 请确保在运行实验前已启动 Ray。如果 Ray 已在运行，可通过以下命令重启：
 
 ```bash
 ray stop
@@ -89,41 +97,60 @@ ray start --head
 
 ---
 
-**报错：** 内存不足 (OOM) 错误
+**报错：** 内存不足（OOM）错误
 
-**A:** 以下参数可能有所帮助：
+**A:** 可尝试调整以下参数：
 
-- 对于 trainer：当 `trainer.use_dynamic_bsz=false` 时，调整 `trainer.max_token_len_per_gpu`；当 `trainer.use_dynamic_bsz=true` 时，调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。
-- 对于 explorer：调整 `explorer.rollout_model.tensor_parallel_size`。
+- 对于 trainer，当 `trainer.use_dynamic_bsz=false` 时，调整 `trainer.max_token_len_per_gpu`；当 `trainer.use_dynamic_bsz=true` 时，调整 `trainer.ppo_max_token_len_per_gpu` 和 `trainer.ulysses_sequence_parallel_size`。设置 `trainer.trainer_config.actor_rollout_ref.actor.entropy_from_logits_with_chunking=true` 也可能有帮助。
+- 对于 explorer，调整 `explorer.rollout_model.tensor_parallel_size`。
+
+此外，Trinity-RFT 提供了[GPU 相关配置指南](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_gpu_configs.html)，可参考其中建议。
 
 ## 第三部分：调试方法
-Trinity-RFT 现在支持 actor 级别的日志功能，可自动将每个 actor（例如 explorer 和 trainer）的日志保存到 `<checkpoint_job_dir>/log/<actor_name>` 目录下。如需查看更详细的日志信息，可通过在配置文件中设置 `log.level=debug`，将默认日志级别（`info`）更改为 `debug`。
 
-你也可以查看所有进程的完整日志并保存到 `debug.log`：
+**Q:** Trinity-RFT 运行日志在哪查看？
+
+**A:** Trinity-RFT 支持 actor 级别日志，会自动将每个 actor（如 explorer 和 trainer）的日志保存到 `<checkpoint_job_dir>/log/<actor_name>`。
+
+常见日志包括：
+
+- `<checkpoint_job_dir>/log/explorer.log`：explorer 进程日志，包括每步起止时间、评测指标、模型权重同步、异常等。
+- `<checkpoint_job_dir>/log/trainer.log`：trainer 进程日志，包括每次训练迭代的起止时间、评测指标、模型权重同步等。
+- `<checkpoint_job_dir>/log/explorer_runner_<n>.log`：workflow runner 进程日志，包括 workflow 打印的日志和运行异常（需在 Workflow 中用 `self.logger` 打印日志）。
+
+如需更详细日志，可在配置文件中设置 `log.level=debug`。
+
+如需查看所有进程的完整日志并保存到 `debug.log`：
 
 ```bash
 export RAY_DEDUP_LOGS=0
 trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
 ```
 
-### 调试工作流（Workflow）
-
-
-实现新工作流后，可使用 Trinity-RFT 的调试模式进行调试，步骤如下：
+---
 
-1. 启动推理模型： `trinity debug --config <config_file_path> --module inference_model`
+**Q:** 如何在不运行完整实验的情况下调试 workflow？
 
-2. 在另一个终端中进行工作流的调试：`trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>`
+**A:** 可用 Trinity-RFT 的 debug 模式，步骤如下：
 
-更多详细信息，请参阅{ref}`工作流开发指南 <Workflows>`章节。
+1. 启动推理模型：
+   ```bash
+   trinity debug --config <config_file_path> --module inference_model
+   ```
+2. 在另一个终端调试 workflow：
+   ```bash
+   trinity debug --config <config_file_path> --module workflow --output-file <output_file_path> --plugin-dir <plugin_dir>
+   ```
 
+详细说明见 {ref}`工作流开发指南 <Workflows>`。
 
 ## 第四部分：其他问题
-**Q:** `buffer.trainer_input.experience_buffer.path` 的作用是什么？
 
-**A:** 该路径指定了用于持久化存储生成的 experience 的 SQLite 数据库路径。如果你不想使用 SQLite 数据库，可以注释掉这一行。
+**Q:** `buffer.trainer_input.experience_buffer.path` 有什么作用？
+
+**A:** 该路径指定用于存储生成 experience 的 SQLite 数据库路径。如果不需要，可注释掉该行。
 
-要查看数据库中的 experience，可以使用以下 Python 脚本：
+如需查看数据库中的 experience，可用如下 Python 脚本：
 
 ```python
 from sqlalchemy import create_engine
@@ -147,14 +174,14 @@ exp_list = []
 for exp in experiences:
     exp_list.append(ExperienceModel.to_experience(exp))
 
-# 打印 experience 信息
+# 打印 experience
 for exp in exp_list:
     print(f"{exp.prompt_text=}", f"{exp.response_text=}")
 ```
 
 ---
 
-**Q:** 如何在 Trinity-RFT 框架之外加载检查点（checkpoints）？
+**Q:** 如何在 Trinity-RFT 框架外加载 checkpoints？
 
 **A:** 你需要指定模型路径和检查点路径。以下代码片段展示了如何使用 transformers 库进行加载。
 
@@ -171,3 +198,15 @@ model = AutoModelForCausalLM.from_pretrained(model_path)
 ckp_path = os.path.join(checkpoint_root_dir, project, name, "global_step_780", "actor")
 model.load_state_dict(load_fsdp_state_dict_from_verl_checkpoint(ckp_path))
 ```
+
+---
+
+**Q:** Trinity-RFT 和 veRL 有什么区别？
+
+**A:** Trinity-RFT 以 veRL 作为 trainer 后端，并在其基础上扩展了更模块化和灵活的架构。主要区别包括：
+
+- **模块化算法**：Trinity-RFT 将 veRL `core_algos.py` 中的算法相关组件（如优势函数、损失函数）提取为独立模块，便于用户实现和注册新算法，无需修改核心代码。
+- **Explorer 与 Trainer 分离**：Trinity-RFT 用独立 Explorer 模块替代 veRL 的 rollout model，专门负责 agent 与环境交互，支持更灵活的 workflow 设计和 rollout-training 调度。
+- **全生命周期数据通路**：Trinity-RFT 在 Explorer 和 Trainer 之间增加 Buffer 模块，提供完整的数据存储、处理和采样通路，支持经验回放、优先采样等高级数据处理策略。
+
+我们还提供了 Trinity-RFT 与 veRL 及其衍生系统（如 [rLLM](https://github.com/rllm-org/rllm)）的基准对比，详见 [Benchmark](https://github.com/modelscope/Trinity-RFT/tree/main/benchmark)。