You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
41
+
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
42
+
- Rollout and training can run separately and scale independently across devices.
43
+
- Boost sample and time efficiency by experience replay.
42
44
43
45
<imgsrc="https://img.alicdn.com/imgextra/i3/O1CN01E7NskS1FFoTI9jlaQ_!!6000000000458-2-tps-1458-682.png"alt="RFT modes supported by Trinity-RFT"width="600" />
44
46
45
-
***General Agentic-RL Support:**
46
-
- Supports both concatenated and general multi-turn agentic workflows. Able to directly train agent applications developed using agent frameworks like AgentScope.
47
+
***Agentic RL Support:**
48
+
- Supports both concatenated and general multi-step agentic workflows.
49
+
- Able to directly train agent applications developed using agent frameworks like AgentScope.
- Enables pipeline processing of rollout and experience data, supporting active management (prioritization, cleaning, augmentation) throughout the RFT lifecycle.
53
+
***Full-Lifecycle Data Pipelines:**
54
+
- Enables pipeline processing of rollout tasks and experience samples.
55
+
- Active data management (e.g., prioritization, cleaning, augmentation) throughout the RFT lifecycle.
| Run diverse RFT modes | + [Quick example: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)<br>+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)<br>+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)<br>+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)|
73
+
| Multi-step agentic scenarios | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)|
74
+
| Advanced data pipelines | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>+ [Experience replay](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)|
75
+
| Algorithm development / research | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) ([paper](https://arxiv.org/pdf/2508.11408))<br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) <br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) ([paper](https://arxiv.org/abs/2509.24203))|
76
+
| Going deeper into Trinity-RFT | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [Benchmark toolkit for quick verification and experimentation](./benchmark/README.md)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)|
77
+
78
+
79
+
> [!NOTE]
80
+
> For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/).
81
+
82
+
61
83
## 🚀 News
62
84
63
-
*[2025-10]✨ [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
85
+
*[2025-10][[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
64
86
*[2025-09][[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
@@ -73,16 +95,13 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
73
95
74
96
---
75
97
76
-
## Table of contents
77
-
98
+
## Table of Contents
78
99
79
100
-[Quick Start](#quick-start)
80
101
-[Step 1: installation](#step-1-installation)
81
102
-[Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model)
82
103
-[Step 3: configurations](#step-3-configurations)
83
104
-[Step 4: run the RFT process](#step-4-run-the-rft-process)
84
-
-[Further tutorials](#further-tutorials)
85
-
-[Upcoming features](#upcoming-features)
86
105
-[Contribution guide](#contribution-guide)
87
106
-[Acknowledgements](#acknowledgements)
88
107
-[Citation](#citation)
@@ -101,7 +120,7 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
101
120
Before installing, make sure your system meets the following requirements:
102
121
103
122
-**Python**: version 3.10 to 3.12 (inclusive)
104
-
-**CUDA**: version 12.4 to 12.8 (inclusive)
123
+
-**CUDA**: version >= 12.6
105
124
-**GPUs**: at least 2 GPUs
106
125
107
126
@@ -276,7 +295,9 @@ ray start --head
276
295
ray start --address=<master_address>
277
296
```
278
297
279
-
(Optional) Log in to [wandb](https://docs.wandb.ai/quickstart/) for better monitoring:
298
+
(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring.
299
+
Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
300
+
For example, to log in to Wandb:
280
301
281
302
```shell
282
303
export WANDB_API_KEY=<your_api_key>
@@ -298,54 +319,8 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
298
319
For studio users, click "Run" in the web interface.
299
320
300
321
301
-
## Further tutorials
302
-
303
-
> [!NOTE]
304
-
> For more tutorials, please refer to the [Trinity-RFT Documentation](https://modelscope.github.io/Trinity-RFT/).
305
-
306
-
307
-
Tutorials for running different RFT modes:
308
-
309
-
+[Quick example: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
+[ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
320
-
321
-
322
-
Tutorials for data-related functionalities:
323
-
324
-
+[Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
325
-
326
-
327
-
Tutorials for RL algorithm development/research with Trinity-RFT:
328
-
329
-
+[RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
330
-
331
-
332
-
Guidelines for full configurations:
333
-
334
-
+ See [this document](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
335
-
336
-
337
-
Guidelines for developers and researchers:
338
-
339
-
+[Benchmark Toolkit for quick verification and experimentation](./benchmark/README.md)
340
-
+[Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
341
-
342
-
343
-
## Upcoming features
344
-
345
-
A tentative roadmap: [#51](https://github.com/modelscope/Trinity-RFT/issues/51)
346
-
347
322
348
-
## Contribution guide
323
+
## Contribution Guide
349
324
350
325
This project is currently under active development, and we welcome contributions from the community!
351
326
@@ -356,7 +331,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines.
356
331
357
332
This project is built upon many excellent open-source projects, including:
358
333
359
-
+[verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
334
+
+[verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
360
335
+[vLLM](https://github.com/vllm-project/vllm) for LLM inference;
361
336
+[Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
362
337
+[AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;
0 commit comments