diff --git a/docs/sphinx_doc/source/api_reference.rst b/docs/sphinx_doc/source/api_reference.rst new file mode 100644 index 0000000000..2fa2c8f692 --- /dev/null +++ b/docs/sphinx_doc/source/api_reference.rst @@ -0,0 +1,18 @@ +.. _api-reference: + +API Reference +============= + +This page shows some useful APIs of Trinity-RFT. Click the API name to see the detailed documentation. + +.. toctree:: + :maxdepth: 1 + :glob: + + build_api/trinity.buffer + build_api/trinity.explorer + build_api/trinity.trainer + build_api/trinity.algorithm + build_api/trinity.manager + build_api/trinity.common + build_api/trinity.utils diff --git a/docs/sphinx_doc/source/index.rst b/docs/sphinx_doc/source/index.rst index fc085215b0..062e9e9e7f 100644 --- a/docs/sphinx_doc/source/index.rst +++ b/docs/sphinx_doc/source/index.rst @@ -35,19 +35,14 @@ Welcome to Trinity-RFT's documentation! .. toctree:: :maxdepth: 2 + :hidden: :caption: FAQ tutorial/faq.md .. toctree:: - :maxdepth: 1 - :glob: + :maxdepth: 2 + :hidden: :caption: API Reference - build_api/trinity.buffer - build_api/trinity.explorer - build_api/trinity.trainer - build_api/trinity.algorithm - build_api/trinity.manager - build_api/trinity.common - build_api/trinity.utils + api_reference diff --git a/docs/sphinx_doc/source/main.md b/docs/sphinx_doc/source/main.md index bdbf75eae8..665c125b4f 100644 --- a/docs/sphinx_doc/source/main.md +++ b/docs/sphinx_doc/source/main.md @@ -6,7 +6,6 @@ # Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models - ## 🚀 News * [2025-07] Trinity-RFT v0.2.0 is released. @@ -82,6 +81,7 @@ It is designed to support diverse application scenarios and serve as a unified p ![Trinity-RFT-data-pipelines](../assets/trinity-data-pipelines.png) +
@@ -90,12 +90,12 @@ It is designed to support diverse application scenarios and serve as a unified p * **Adaptation to New Scenarios:** - Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)) + Implement agent-environment interaction logic in a single `Workflow` or `MultiTurnWorkflow` class. ([Example](/tutorial/example_multi_turn.md)) * **RL Algorithm Development:** - Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](./docs/sphinx_doc/source/tutorial/example_mix_algo.md)) + Develop custom RL algorithms (loss design, sampling, data processing) in compact, plug-and-play classes. ([Example](/tutorial/example_mix_algo.md)) * **Low-Code Usage:** @@ -301,39 +301,39 @@ For studio users, click "Run" in the web interface. Tutorials for running different RFT modes: -+ [Quick example: GRPO on GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md) -+ [Off-policy RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md) -+ [Fully asynchronous RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md) -+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md) ++ [Quick example: GRPO on GSM8k](/tutorial/example_reasoning_basic.md) ++ [Off-policy RFT](/tutorial/example_reasoning_advanced.md) ++ [Fully asynchronous RFT](/tutorial/example_async_mode.md) ++ [Offline learning by DPO or SFT](/tutorial/example_dpo.md) Tutorials for adapting Trinity-RFT to a new multi-turn agentic scenario: -+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md) ++ [Multi-turn tasks](/tutorial/example_multi_turn.md) Tutorials for data-related functionalities: -+ [Advanced data processing & human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md) ++ [Advanced data processing & human-in-the-loop](/tutorial/example_data_functionalities.md) Tutorials for RL algorithm development/research with Trinity-RFT: -+ [RL algorithm development with Trinity-RFT](./docs/sphinx_doc/source/tutorial/example_mix_algo.md) ++ [RL algorithm development with Trinity-RFT](/tutorial/example_mix_algo.md) -Guidelines for full configurations: see [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md) +Guidelines for full configurations: see [this document](/tutorial/trinity_configs.md) Guidelines for developers and researchers: -+ [Build new RL scenarios](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers) -+ [Implement new RL algorithms](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers) ++ [Build new RL scenarios](/tutorial/trinity_programming_guide.md#workflows-for-rl-environment-developers) ++ [Implement new RL algorithms](/tutorial/trinity_programming_guide.md#algorithms-for-rl-algorithm-developers) -For some frequently asked questions, see [FAQ](./docs/sphinx_doc/source/tutorial/faq.md). +For some frequently asked questions, see [FAQ](/tutorial/faq.md). diff --git a/docs/sphinx_doc/source/tutorial/example_data_functionalities.md b/docs/sphinx_doc/source/tutorial/example_data_functionalities.md index bb73298ae3..2ddd135694 100644 --- a/docs/sphinx_doc/source/tutorial/example_data_functionalities.md +++ b/docs/sphinx_doc/source/tutorial/example_data_functionalities.md @@ -8,7 +8,7 @@ In this example, you will learn how to apply the data processor of Trinity-RFT t 2. how to configure the data processor 3. what the data processor can do -Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of the README file](../main.md), +Before getting started, you need to prepare the main environment of Trinity-RFT according to the [installation section of Quickstart](example_reasoning_basic.md), and store the base url and api key in the environment variables `OPENAI_BASE_URL` and `OPENAI_API_KEY` for some agentic or API-model usages if necessary. ### Data Preparation diff --git a/docs/sphinx_doc/source/tutorial/example_multi_turn.md b/docs/sphinx_doc/source/tutorial/example_multi_turn.md index 7169731b9c..1212b9dcf4 100644 --- a/docs/sphinx_doc/source/tutorial/example_multi_turn.md +++ b/docs/sphinx_doc/source/tutorial/example_multi_turn.md @@ -14,7 +14,37 @@ To run the ALFworld and WebShop env, you need to setup the corresponding environ - ALFworld is a text-based interactive environment that simulates household scenarios. Agents need to understand natural language instructions and complete various domestic tasks like finding objects, moving items, and operating devices in a virtual home environment. - WebShop is a simulated online shopping environment where AI agents learn to shop based on user requirements. The platform allows agents to browse products, compare options, and make purchase decisions, mimicking real-world e-commerce interactions. -You may refer to their original environment to complete the setup. +
+
+Guidelines for preparing ALFWorld environment + +1. Pip install: `pip install alfworld[full]` + +2. Export the path: `export ALFWORLD_DATA=/path/to/alfworld/data` + +3. Download the environment: `alfworld-download` + +Now you can find the environment in `$ALFWORLD_DATA` and continue with the following steps. +
+ +
+Guidelines for preparing WebShop environment + +1. Install Python 3.8.13 + +2. Install Java + +3. Download the source code: `git clone https://github.com/princeton-nlp/webshop.git webshop` + +4. Create a virtual environment: `conda create -n webshop python=3.8.13` and `conda activate webshop` + +5. Install requirements into the `webshop` virtual environment via the `setup.sh` script: `./setup.sh [-d small|all]` + +Now you can continue with the following steps. +
+
+ +You may refer to their original environment for more details. - For ALFWorld, refer to the [ALFWorld](https://github.com/alfworld/alfworld) repository. - For WebShop, refer to the [WebShop](https://github.com/princeton-nlp/WebShop) repository.