Skip to content

Commit a914078

Browse files
committed
Merge branch 'main' into dev/fix_microbatch_loss_scale
2 parents 42b9999 + 6ff5195 commit a914078

File tree

23 files changed

+210
-194
lines changed

23 files changed

+210
-194
lines changed

.github/workflows/docker/docker-compose.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
services:
22
trinity-node-1:
3-
image: trinity-rft-unittest:20250924
3+
image: trinity-rft-unittest:20251030
44
pull_policy: never
55
command: sh -c "pip install -e .[dev] && ray start --head --dashboard-host 0.0.0.0 --include-dashboard true --block"
66
environment:
@@ -29,7 +29,7 @@ services:
2929
capabilities: [gpu]
3030

3131
trinity-node-2:
32-
image: trinity-rft-unittest:20250924
32+
image: trinity-rft-unittest:20251030
3333
pull_policy: never
3434
command: sh -c "pip install -e .[dev] && ray start --address=trinity-node-1:6379 --block"
3535
environment:

.github/workflows/unittest.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,15 @@ jobs:
9797
fi
9898
fi
9999
100+
- name: Convert report.json time to ms
101+
working-directory: trinity-${{ github.run_id }}
102+
if: env.tests_run == 'true' || failure()
103+
run: |
104+
REPORT=report.json
105+
if [ -f "$REPORT" ]; then
106+
jq '(.results.tests[] | .duration, .start, .stop) |= (. * 1000) | (.results.summary.start, .results.summary.stop) |= (. * 1000)' "$REPORT" > "$REPORT.tmp" && mv "$REPORT.tmp" "$REPORT"
107+
fi
108+
100109
- name: Clean checkpoint dir
101110
working-directory: trinity-${{ github.run_id }}/.github/workflows/docker
102111
if: always()

README.md

Lines changed: 38 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -31,36 +31,58 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
3131
- Example: [Mixture of SFT and GRPO](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
3232

3333
* 📊 For data engineers. [[tutorial]](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_operator.html)
34-
- Create task-specific datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
34+
- Create datasets and build data pipelines for cleaning, augmentation, and human-in-the-loop scenarios.
3535
- Example: [Data Processing](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
3636

3737

3838
## 🌟 Key Features
3939

4040
* **Flexible RFT Modes:**
41-
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline training. Rollout and training can run separately and scale independently across devices.
41+
- Supports synchronous/asynchronous, on-policy/off-policy, and online/offline RL.
42+
- Rollout and training can run separately and scale independently across devices.
43+
- Boost sample and time efficiency by experience replay.
4244

4345
<img src="https://img.alicdn.com/imgextra/i3/O1CN01E7NskS1FFoTI9jlaQ_!!6000000000458-2-tps-1458-682.png" alt="RFT modes supported by Trinity-RFT" width="600" />
4446

45-
* **General Agentic-RL Support:**
46-
- Supports both concatenated and general multi-turn agentic workflows. Able to directly train agent applications developed using agent frameworks like AgentScope.
47+
* **Agentic RL Support:**
48+
- Supports both concatenated and general multi-step agentic workflows.
49+
- Able to directly train agent applications developed using agent frameworks like AgentScope.
4750

4851
<img src="https://img.alicdn.com/imgextra/i1/O1CN01z1i7kk1jlMEVa8ZHV_!!6000000004588-2-tps-1262-695.png" alt="Agentic workflows" width="600" />
4952

50-
* **Full Lifecycle Data Pipelines:**
51-
- Enables pipeline processing of rollout and experience data, supporting active management (prioritization, cleaning, augmentation) throughout the RFT lifecycle.
53+
* **Full-Lifecycle Data Pipelines:**
54+
- Enables pipeline processing of rollout tasks and experience samples.
55+
- Active data management (e.g., prioritization, cleaning, augmentation) throughout the RFT lifecycle.
56+
- Native support for multi-task joint learning.
5257

53-
<img src="https://img.alicdn.com/imgextra/i2/O1CN01BfeHp61sXSlGjH7zQ_!!6000000005776-2-tps-1734-473.png" alt="Data pipeline design" width="600" />
58+
<img src="https://img.alicdn.com/imgextra/i2/O1CN01Gk9CRw28NsL09nbOj_!!6000000007921-2-tps-2530-660.png" alt="Data pipeline design" width="720" />
5459

5560
* **User-Friendly Design:**
56-
- Modular, decoupled architecture for easy adoption and development. Rich graphical user interfaces enable low-code usage.
61+
- Plug-and-play modules and decoupled architecture, facilitating easy adoption and development.
62+
- Rich graphical user interfaces enable low-code usage.
5763

5864
<img src="https://img.alicdn.com/imgextra/i1/O1CN01Ti0o4320RywoAuyhN_!!6000000006847-2-tps-3840-2134.png" alt="System architecture" width="600" />
5965

6066

67+
## 🔨 Tutorials and Guidelines
68+
69+
70+
| Category | Tutorial / Guideline |
71+
| --- | --- |
72+
| Run diverse RFT modes | + [Quick example: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)<br>+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)<br>+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)<br>+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html) |
73+
| Multi-step agentic scenarios | + [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)<br>+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)<br>+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html) |
74+
| Advanced data pipelines | + [Rollout task mixing and selection](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/develop_selector.html)<br>+ [Experience replay](https://github.com/modelscope/Trinity-RFT/tree/main/examples/ppo_countdown_exp_replay)<br>+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html) |
75+
| Algorithm development / research | + [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html) ([paper](https://arxiv.org/pdf/2508.11408))<br>+ Non-verifiable domains: [RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_ruler), [trainable RULER](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k_trainable_ruler), [rubric-as-reward](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_rubric_as_reward) <br>+ [Research project: group-relative REINFORCE](https://github.com/modelscope/Trinity-RFT/tree/main/examples/rec_gsm8k) ([paper](https://arxiv.org/abs/2509.24203))|
76+
| Going deeper into Trinity-RFT | + [Full configurations](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)<br>+ [Benchmark toolkit for quick verification and experimentation](./benchmark/README.md)<br>+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html) |
77+
78+
79+
> [!NOTE]
80+
> For more tutorials, please refer to the [Trinity-RFT documentation](https://modelscope.github.io/Trinity-RFT/).
81+
82+
6183
## 🚀 News
6284

63-
* [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
85+
* [2025-10] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.1)] Trinity-RFT v0.3.1 released: multi-stage training support, improved agentic RL examples, LoRA support, debug mode and new RL algorithms.
6486
* [2025-09] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.0)] Trinity-RFT v0.3.0 released: enhanced Buffer, FSDP2 & Megatron support, multi-modal models, and new RL algorithms/examples.
6587
* [2025-08] Introducing [CHORD](https://github.com/modelscope/Trinity-RFT/tree/main/examples/mix_chord): dynamic SFT + RL integration for advanced LLM fine-tuning ([paper](https://arxiv.org/pdf/2508.11408)).
6688
* [2025-08] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.2.1)] Trinity-RFT v0.2.1 released.
@@ -73,16 +95,13 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
7395

7496
---
7597

76-
## Table of contents
77-
98+
## Table of Contents
7899

79100
- [Quick Start](#quick-start)
80101
- [Step 1: installation](#step-1-installation)
81102
- [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model)
82103
- [Step 3: configurations](#step-3-configurations)
83104
- [Step 4: run the RFT process](#step-4-run-the-rft-process)
84-
- [Further tutorials](#further-tutorials)
85-
- [Upcoming features](#upcoming-features)
86105
- [Contribution guide](#contribution-guide)
87106
- [Acknowledgements](#acknowledgements)
88107
- [Citation](#citation)
@@ -101,7 +120,7 @@ Trinity-RFT is a flexible, general-purpose framework for reinforcement fine-tuni
101120
Before installing, make sure your system meets the following requirements:
102121

103122
- **Python**: version 3.10 to 3.12 (inclusive)
104-
- **CUDA**: version 12.4 to 12.8 (inclusive)
123+
- **CUDA**: version >= 12.6
105124
- **GPUs**: at least 2 GPUs
106125

107126

@@ -276,7 +295,9 @@ ray start --head
276295
ray start --address=<master_address>
277296
```
278297

279-
(Optional) Log in to [wandb](https://docs.wandb.ai/quickstart/) for better monitoring:
298+
(Optional) You may use [Wandb](https://docs.wandb.ai/quickstart/) / [TensorBoard](https://www.tensorflow.org/tensorboard) / [MLFlow](https://mlflow.org) for better monitoring.
299+
Please refer to [this documentation](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html#monitor-configuration) for the corresponding configurations.
300+
For example, to log in to Wandb:
280301

281302
```shell
282303
export WANDB_API_KEY=<your_api_key>
@@ -298,54 +319,8 @@ trinity run --config examples/grpo_gsm8k/gsm8k.yaml
298319
For studio users, click "Run" in the web interface.
299320

300321

301-
## Further tutorials
302-
303-
> [!NOTE]
304-
> For more tutorials, please refer to the [Trinity-RFT Documentation](https://modelscope.github.io/Trinity-RFT/).
305-
306-
307-
Tutorials for running different RFT modes:
308-
309-
+ [Quick example: GRPO on GSM8k](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_basic.html)
310-
+ [Off-policy RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_reasoning_advanced.html)
311-
+ [Fully asynchronous RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_async_mode.html)
312-
+ [Offline learning by DPO or SFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_dpo.html)
313-
314-
315-
Tutorials for adapting Trinity-RFT to multi-step agentic scenarios:
316-
317-
+ [Concatenated multi-turn workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_multi_turn.html)
318-
+ [General multi-step workflow](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_step_wise.html)
319-
+ [ReAct workflow with an agent framework](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_react.html)
320-
321-
322-
Tutorials for data-related functionalities:
323-
324-
+ [Advanced data processing & human-in-the-loop](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_data_functionalities.html)
325-
326-
327-
Tutorials for RL algorithm development/research with Trinity-RFT:
328-
329-
+ [RL algorithm development with Trinity-RFT](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/example_mix_algo.html)
330-
331-
332-
Guidelines for full configurations:
333-
334-
+ See [this document](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/trinity_configs.html)
335-
336-
337-
Guidelines for developers and researchers:
338-
339-
+ [Benchmark Toolkit for quick verification and experimentation](./benchmark/README.md)
340-
+ [Understand the coordination between explorer and trainer](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/synchronizer.html)
341-
342-
343-
## Upcoming features
344-
345-
A tentative roadmap: [#51](https://github.com/modelscope/Trinity-RFT/issues/51)
346-
347322

348-
## Contribution guide
323+
## Contribution Guide
349324

350325
This project is currently under active development, and we welcome contributions from the community!
351326

@@ -356,7 +331,7 @@ See [CONTRIBUTING.md](./CONTRIBUTING.md) for detailed contribution guidelines.
356331

357332
This project is built upon many excellent open-source projects, including:
358333

359-
+ [verl](https://github.com/volcengine/verl) and [PyTorch's FSDP](https://pytorch.org/docs/stable/fsdp.html) for LLM training;
334+
+ [verl](https://github.com/volcengine/verl), [FSDP](https://pytorch.org/docs/stable/fsdp.html) and [Megatron-LM](https://github.com/NVIDIA/Megatron-LM) for LLM training;
360335
+ [vLLM](https://github.com/vllm-project/vllm) for LLM inference;
361336
+ [Data-Juicer](https://github.com/modelscope/data-juicer?tab=readme-ov-file) for data processing pipelines;
362337
+ [AgentScope](https://github.com/agentscope-ai/agentscope) for agentic workflow;

0 commit comments

Comments
 (0)