Skip to content

Commit fde0a4a

Browse files
committed
Synchronize with the internal version
1 parent dc191ed commit fde0a4a

File tree

150 files changed

+17646
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

150 files changed

+17646
-0
lines changed

README.md

Lines changed: 319 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,319 @@
1+
2+
3+
<!-- ![trinity-rft](./docs/sphinx_doc/assets/trinity-title.png) -->
4+
5+
<div align="center">
6+
<img src="./docs/sphinx_doc/assets/trinity-title.png" alt="Trinity-RFT">
7+
</div>
8+
9+
10+
11+
Trinity-RFT is a general-purpose, flexible and scalable framework designed for reinforcement fine-tuning (RFT) of large language models (LLM).
12+
Built with a decoupled architecture, seamless integration for agentic workflows, and systematic data processing pipelines, Trinity-RFT can be easily adapted for diverse application scenarios, and serve as a platform for exploring advanced reinforcement learning (RL) paradigms.
13+
14+
15+
16+
17+
18+
**Vision of this project:**
19+
20+
21+
Current RFT approaches, such as RLHF (Reinforcement Learning from Human Feedback) with proxy reward models or training long-CoT reasoning LLMs with rule-based rewards, are limited in their ability to handle dynamic, real-world learning.
22+
Trinity-RFT envisions a future where AI agents learn by interacting directly with environments, collecting delayed or complex reward signals, and continuously refining their behavior through advanced RL paradigms.
23+
For example, imagine an AI scientist that designs an experiment, executes it via interacting with the environment, waits for feedback (while working on some other tasks concurrently), and iteratively updates itself based on true environmental rewards when the experiment is finally finished.
24+
Trinity-RFT offers a path into this future by addressing critical gaps in existing solutions.
25+
26+
27+
28+
29+
30+
**Key features of Trinity-RFT:**
31+
32+
33+
34+
+ **Unified RFT modes & algorithm support.**
35+
Trinity-RFT unifies and generalizes existing RFT methodologies into a flexible and configurable framework, supporting synchronous/asynchronous and on-policy/off-policy/offline training, as well as hybrid modes that combine the above seamlessly into a single learning process (e.g., incorporating expert trajectories or high-quality SFT data to accelerate an online RL process).
36+
37+
+ **Agent-environment interaction as a first-class citizen.**
38+
Trinity-RFT natively models the challenges of RFT with real-world agent-environment interactions. It allows delayed rewards in multi-step and/or time-lagged feedback loops, handles long-tailed latencies and environment/agent failures gracefully, and supports distributed deployment where explorers (i.e., the rollout agents) and trainers (i.e., the policy model trained by RL) can operate across separate clusters or devices (e.g., explorers on edge devices, trainers in cloud clusters) and scale up independently.
39+
40+
+ **Data processing pipelines optimized for RFT with diverse/messy data.**
41+
These include converting raw datasets to prompt/task sets for RL, cleaning/filtering/prioritizing experiences stored in the replay buffer, synthesizing data for tasks and experiences, offering user interfaces for RFT with human in the loop, managing the task and experience buffers (e.g., supporting collection of lagged reward signals), among others.
42+
43+
44+
45+
46+
## The design of Trinity-RFT
47+
48+
49+
<!-- ![design](./docs/sphinx_doc/assets/trinity-design.png) -->
50+
51+
<div align="center">
52+
<img src="./docs/sphinx_doc/assets/trinity-design.png" alt="Trinity-RFT">
53+
</div>
54+
55+
56+
57+
58+
59+
The overall design of Trinity-RFT exhibits a trinity:
60+
+ RFT-core;
61+
+ agent-environment interaction;
62+
+ data processing pipelines tailored to RFT.
63+
64+
65+
66+
In particular, the design of RFT-core also exhibits a trinity:
67+
+ explorer;
68+
+ trainer;
69+
+ manager & buffer.
70+
71+
72+
73+
The explorer, powered by the rollout model, interacts with the environment and generates rollout trajectories to be stored in the experience buffer.
74+
The trainer, powered by the policy model, samples batches of experiences from the buffer and updates the policy via RL algorithms.
75+
These two can be completely decoupled and act asynchronously, except that they share the same experience buffer, and their model weights are synchronized once in a while (according to a schedule specified by user configurations).
76+
77+
78+
Such a decoupled design is crucial for making the aforementioned features of Trinity-RFT possible,
79+
e.g., flexible and configurable RFT modes (on-policy/off-policy, synchronous/asynchronous, immediate/lagged rewards),
80+
fault tolerance for failures of explorer (agent/environment) or trainer,
81+
high efficiency in the presence of long-tailed rollout latencies,
82+
data processing pipelines and human in the loop of RFT (e.g., via acting on the experience buffer, which is implemented as a persistent database),
83+
among others.
84+
85+
86+
87+
Meanwhile, Trinity-RFT has done the dirty work for ensuring high efficiency in every component of the framework,
88+
e.g., utilizing NCCL (when feasible) for model weight synchronization, sequence concatenation with proper masking for multi-turn conversations and ReAct workflows, pipeline parallelism for the synchronous RFT mode, among many others.
89+
90+
91+
92+
## Getting started
93+
94+
95+
*Note: this project is currently under active development; comments and suggestions are welcome!*
96+
97+
98+
99+
100+
### Step 1: preparations
101+
102+
103+
104+
105+
Installation from source (recommended):
106+
107+
```shell
108+
# Pull the source code from GitHub
109+
git clone https://github.com/modelscope/Trinity-RFT
110+
cd Trinity-RFT
111+
112+
# Create a new environment using Conda or venv
113+
# Option 1: Conda
114+
conda create -n trinity python=3.10
115+
conda activate trinity
116+
117+
# Option 2: venv
118+
python3.10 -m venv .venv
119+
source .venv/bin/activate
120+
121+
# Install the package in editable mode
122+
# for bash
123+
pip install -e .[dev]
124+
# for zsh
125+
pip install -e .\[dev\]
126+
```
127+
128+
129+
130+
Installation with pip:
131+
(coming soon)
132+
133+
134+
135+
### Step 2: prepare dataset and model
136+
137+
138+
Trinity-RFT supports most datasets and models from Huggingface and ModelScope.
139+
140+
141+
**Prepare the model** in the local directory `$MODEL_PATH/{model_name}`:
142+
143+
```bash
144+
# Using Huggingface
145+
huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
146+
147+
# Using Modelscope
148+
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
149+
```
150+
151+
For more details about model downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
152+
153+
154+
155+
**Prepare the dataset** in the local directory `$DATASET_PATH/{dataset_name}`:
156+
157+
```bash
158+
# Using Huggingface
159+
huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET_PATH/{dataset_name}
160+
161+
# Using Modelscope
162+
modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
163+
```
164+
165+
For more details about dataset downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
166+
167+
168+
169+
### Step 3: configurations
170+
171+
172+
You may customize the configurations in `scripts/config/{config_name}.yaml`and `scripts/config/{train_config_name}.yaml`. For example, the model and dataset are specified as:
173+
174+
```yaml
175+
model:
176+
model_path: $MODEL_PATH/{model_name}
177+
178+
data:
179+
dataset_path: $DATASET_PATH/{dataset_name}
180+
181+
trainer:
182+
trainer_config_path: scripts/config/{train_config_name}.yaml
183+
```
184+
185+
You may use the default configurations located in the directory `scripts/config`. Please refer to `examples` for more details.
186+
187+
188+
189+
### Step 4: run the RFT process
190+
191+
192+
First, start a ray cluster with the following command:
193+
194+
```shell
195+
# On master node
196+
ray start --head
197+
198+
# On worker nodes
199+
ray start --address=<master_address>
200+
```
201+
202+
203+
204+
Optionally, we can login into wandb to monitor the RFT process. More details of wandb can be found in its [docs](https://docs.wandb.ai/quickstart/).
205+
206+
```shell
207+
export WANDB_API_KEY=<your_api_key>
208+
wandb login
209+
```
210+
211+
212+
213+
Then, run the RFT process with the following command:
214+
215+
```shell
216+
trinity run --config <config_path>
217+
```
218+
219+
220+
221+
For example, below is the command for fine-tuning Qwen-2.5-1B-Instruct on GSM8k dataset using GRPO algorithm:
222+
223+
```shell
224+
trinity run --config scripts/config/gsm8k.yaml
225+
```
226+
227+
228+
229+
More example config files can be found in `scripts/config`.
230+
231+
232+
233+
For more detailed examples about how to use Trinity-RFT, please refer to the following documents:
234+
+ [A quick example with GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md);
235+
+ [Off-policy / asynchronous modes of RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md);
236+
+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md);
237+
+ [Data processing pipelines](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md);
238+
+ [Offline learning by DPO](./docs/sphinx_doc/source/tutorial/example_dpo.md).
239+
240+
241+
242+
243+
244+
## Advanced usage and full configurations
245+
246+
247+
Please refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_configs.md).
248+
249+
250+
251+
252+
253+
## Programming guide for developers
254+
255+
256+
Please refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_programming_guide.md).
257+
258+
259+
260+
## Contribution guide
261+
262+
263+
This project is currently under active development, and we welcome contributions from the community!
264+
265+
266+
267+
Installation for development:
268+
269+
```shell
270+
# for bash
271+
pip install -e .[dev]
272+
# for zsh
273+
pip install -e .\[dev\]
274+
```
275+
276+
277+
278+
Code style check:
279+
280+
```markdown
281+
pre-commit run --all-files
282+
```
283+
284+
285+
286+
Unit tests:
287+
288+
```markdown
289+
python -m pytest tests
290+
```
291+
292+
293+
294+
## Acknowledgements
295+
296+
297+
This project is built upon many excellent open-source projects, including:
298+
299+
+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
300+
+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
301+
+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
302+
+ [AgentScope](https://github.com/modelscope/agentscope) for agentic workflow;
303+
+ [Ray](https://github.com/ray-project/ray) for distributed systems;
304+
+ we have also drawn inspirations from RL frameworks like [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF), [TRL](https://github.com/huggingface/trl) and [ChatLearn](https://github.com/alibaba/ChatLearn);
305+
+ ......
306+
307+
308+
309+
310+
311+
## Citation
312+
```plain
313+
@misc{Trinity-RFT,
314+
title={Trinity-RFT},
315+
author={{Trinity-RFT Team}},
316+
url={https://github.com/modelscope/trinity-rft},
317+
year={2025},
318+
}
319+
```

docs/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Trinity-RFT Documentation
2+
3+
Please use the following commands to build sphinx doc of Trinity-RFT.
4+
5+
```shell
6+
# Step 1: install dependencies
7+
8+
# for bash
9+
pip install -e .[doc]
10+
# for zsh
11+
pip install -e .\[doc\]
12+
13+
# Step 2: build sphinx doc
14+
15+
cd docs/sphinx_doc
16+
./build_doc.sh
17+
```
18+
19+
The sphinx doc is built in `docs/sphinx_doc/build/html/index.html`.

docs/sphinx_doc/Makefile

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Minimal makefile for Sphinx documentation
2+
#
3+
4+
# You can set these variables from the command line, and also
5+
# from the environment for the first two.
6+
SPHINXOPTS ?=
7+
SPHINXBUILD ?= sphinx-build
8+
SOURCEDIR = source
9+
BUILDDIR = build
10+
11+
# Put it first so that "make" without argument is like "make help".
12+
help:
13+
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
14+
15+
.PHONY: help Makefile
16+
17+
# Catch-all target: route all unknown targets to Sphinx using the new
18+
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
19+
%: Makefile
20+
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{%- macro automodule(modname, options) -%}
2+
.. automodule:: {{ modname }}
3+
{%- for option in options %}
4+
:{{ option }}:
5+
{%- endfor %}
6+
{%- endmacro %}
7+
8+
{{- pkgname | heading }}
9+
10+
{%- macro toctree(docnames) -%}
11+
.. toctree::
12+
:maxdepth: {{ maxdepth }}
13+
{% for docname in docnames %}
14+
{{ docname }}
15+
{%- endfor %}
16+
{%- endmacro %}
17+
18+
{{ automodule(pkgname, automodule_options) }}
275 KB
Loading
881 KB
Loading
842 KB
Loading
969 KB
Loading
147 KB
Loading
234 KB
Loading

0 commit comments

Comments
 (0)