diff --git a/docs/sphinx_doc/source/api_reference.rst b/docs/sphinx_doc/source/api_reference.rst
new file mode 100644
index 0000000000..2fa2c8f692
--- /dev/null
+++ b/docs/sphinx_doc/source/api_reference.rst
@@ -0,0 +1,18 @@
+.. _api-reference:
+
+API Reference
+=============
+
+This page shows some useful APIs of Trinity-RFT. Click the API name to see the detailed documentation.
+
+.. toctree::
+ :maxdepth: 1
+ :glob:
+
+ build_api/trinity.buffer
+ build_api/trinity.explorer
+ build_api/trinity.trainer
+ build_api/trinity.algorithm
+ build_api/trinity.manager
+ build_api/trinity.common
+ build_api/trinity.utils
diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst
index fc085215b0..062e9e9e7f 100644
--- a/docs/sphinx_doc/source/index.rst
+++ b/docs/sphinx_doc/source/index.rst
@@ -35,19 +35,14 @@ Welcome to Trinity-RFT's documentation!
.. toctree::
:maxdepth: 2
+ :hidden:
:caption: FAQ
tutorial/faq.md
.. toctree::
- :maxdepth: 1
- :glob:
+ :maxdepth: 2
+ :hidden:
:caption: API Reference
- build_api/trinity.buffer
- build_api/trinity.explorer
- build_api/trinity.trainer
- build_api/trinity.algorithm
- build_api/trinity.manager
- build_api/trinity.common
- build_api/trinity.utils
+ api_reference
diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md
index bdbf75eae8..665c125b4f 100644
--- a/docs/sphinx_doc/source/main.md
+++ b/docs/sphinx_doc/source/main.md
@@ -6,7 +6,6 @@
# Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
-
## 🚀 News
* [2025-07] Trinity-RFT v0.2.0 is released.
@@ -82,6 +81,7 @@ It is designed to support diverse application scenarios and serve as a unified p

+
@@ -90,12 +90,12 @@ It is designed to support diverse application scenarios and serve as a unified p
* **Adaptation to New Scenarios:**
- Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md))
+ Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](/tutorial/example_multi_turn.md))
* **RL Algorithm Development:**
- Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md))
+ Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](/tutorial/example_mix_algo.md))
* **Low-Code Usage:**
@@ -301,39 +301,39 @@ For studio users, click "Run" in the web interface.
Tutorials for running different RFT modes:
-+ [Quick example: GRPO on GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
-+ [Off-policy RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
-+ [Fully asynchronous RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
-+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
++ [Quick example: GRPO on GSM8k](/tutorial/example_reasoning_basic.md)
++ [Off-policy RFT](/tutorial/example_reasoning_advanced.md)
++ [Fully asynchronous RFT](/tutorial/example_async_mode.md)
++ [Offline learning by DPO or SFT](/tutorial/example_dpo.md)
Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario:
-+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
++ [Multi-turn tasks](/tutorial/example_multi_turn.md)
Tutorials for data-related functionalities:
-+ [Advanced data processing & human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)
++ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md)
Tutorials for RL algorithm development/research with Trinity-RFT:
-+ [RL algorithm development with Trinity-RFT](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)
++ [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md)
-Guidelines for full configurations: see [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md)
+Guidelines for full configurations: see [this document](/tutorial/trinity_configs.md)
Guidelines for developers and researchers:
-+ [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
-+ [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
++ [Build new RL scenarios](/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers)
++ [Implement new RL algorithms](/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers)
-For some frequently asked questions, see [FAQ](./docs/sphinx_doc/source/tutorial/faq.md).
+For some frequently asked questions, see [FAQ](/tutorial/faq.md).
diff --git a/docs/sphinx_doc/source/tutorial/example_data_functionalities.md b/docs/sphinx_doc/source/tutorial/example_data_functionalities.md
index bb73298ae3..2ddd135694 100644
--- a/docs/sphinx_doc/source/tutorial/example_data_functionalities.md
+++ b/docs/sphinx_doc/source/tutorial/example_data_functionalities.md
@@ -8,7 +8,7 @@ In this example, you will learn how to apply the data processor of Trinity-RFT t
2. how to configure the data processor
3. what the data processor can do
-Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of the README file](../main.md),
+Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of Quickstart](example_reasoning_basic.md),
and store the base url and api key in the environment variables `OPENAI_BASE_URL` and `OPENAI_API_KEY` for some agentic or API-model usages if necessary.
### Data Preparation
diff --git a/docs/sphinx_doc/source/tutorial/example_multi_turn.md b/docs/sphinx_doc/source/tutorial/example_multi_turn.md
index 7169731b9c..1212b9dcf4 100644
--- a/docs/sphinx_doc/source/tutorial/example_multi_turn.md
+++ b/docs/sphinx_doc/source/tutorial/example_multi_turn.md
@@ -14,7 +14,37 @@ To run the ALFworld and WebShop env, you need to setup the corresponding environ
- ALFworld is a text-based interactive environment that simulates household scenarios. Agents need to understand natural language instructions and complete various domestic tasks like finding objects, moving items, and operating devices in a virtual home environment.
- WebShop is a simulated online shopping environment where AI agents learn to shop based on user requirements. The platform allows agents to browse products, compare options, and make purchase decisions, mimicking real-world e-commerce interactions.
-You may refer to their original environment to complete the setup.
+
+
+Guidelines for preparing ALFWorld environment
+
+1. Pip install: `pip install alfworld[full]`
+
+2. Export the path: `export ALFWORLD_DATA=/path/to/alfworld/data`
+
+3. Download the environment: `alfworld-download`
+
+Now you can find the environment in `$ALFWORLD_DATA` and continue with the following steps.
+
+
+
+Guidelines for preparing WebShop environment
+
+1. Install Python 3.8.13
+
+2. Install Java
+
+3. Download the source code: `git clone https://github.com/princeton-nlp/webshop.git webshop`
+
+4. Create a virtual environment: `conda create -n webshop python=3.8.13` and `conda activate webshop`
+
+5. Install requirements into the `webshop` virtual environment via the `setup.sh` script: `./setup.sh [-d small|all]`
+
+Now you can continue with the following steps.
+
+
+
+You may refer to their original environment for more details.
- For ALFWorld, refer to the [ALFWorld](https://github.com/alfworld/alfworld) repository.
- For WebShop, refer to the [WebShop](https://github.com/princeton-nlp/WebShop) repository.