Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
d7c43fe
Init Algorithm Module (#58)
pan-x-c May 28, 2025
5cd6cb6
Add Policy Loss Functions (#62)
pan-x-c May 28, 2025
fe217aa
Refactor advantage computation, and delete RayPPOTrainer.fit (#61)
yanxi-chen Jun 3, 2025
9d582e8
Add unittest && bug fix (#65)
chenyushuo Jun 4, 2025
732d801
Add KL/Entorpy Fn (#64)
pan-x-c Jun 5, 2025
2d8f0c1
Refactor advantage computation (cont.) (#68)
yanxi-chen Jun 5, 2025
fec7f3c
Refactor train step (#69)
chenyushuo Jun 10, 2025
48f596a
Fix EntropyLossFn (#77)
shiweijiezero Jun 11, 2025
9e72996
Fix Conflicts with main (#75)
pan-x-c Jun 11, 2025
21ddf5e
merge main
pan-x-c Jun 11, 2025
3c759d9
fix entropy lss
pan-x-c Jun 11, 2025
dc8cb0c
Add Sample Strategy (#78)
pan-x-c Jun 12, 2025
0e56607
Add doc for SFT (#81)
hiyuchang Jun 13, 2025
aeabfe5
merge verl 0.4.0 (#79)
chenyushuo Jun 17, 2025
b8d1faa
[Feature] Add MIX algorithm (#83)
hiyuchang Jun 17, 2025
69ddbd0
Refactor on `select_keys` (#84)
chenyushuo Jun 18, 2025
a592af7
Add guideline for adding new algorithm (#85)
pan-x-c Jun 18, 2025
ab28d0c
merge main
pan-x-c Jun 18, 2025
b7ea08f
fix file_reader
pan-x-c Jun 18, 2025
eea4d85
update pyproject
pan-x-c Jun 18, 2025
aedfa53
update pyproject.toml
pan-x-c Jun 18, 2025
2a36e0e
clean code
pan-x-c Jun 18, 2025
5cb9ebe
fix checkpoint sync mode
pan-x-c Jun 19, 2025
f24db44
Update config manager (#86)
chenyushuo Jun 19, 2025
c85d853
Update docs (#89)
hiyuchang Jun 19, 2025
7a1c526
Refactor `state_dict_meta` init (#90)
chenyushuo Jun 19, 2025
99a772a
Unify async/sync RL (#91)
pan-x-c Jun 20, 2025
6f2d7c7
Support one-step ahead async RL (#93)
pan-x-c Jun 20, 2025
eddf4e4
Refactor data module and support task pipeline in data processor (#92)
HYLcool Jun 20, 2025
7bc465c
Merge branch 'main' into algorithm_dev
pan-x-c Jun 20, 2025
acf7788
bumping repository code version to v0.2.0.dev0
pan-x-c Jun 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ ENV/
logs/

# data-juicer
tmp/
outputs/
# agentscope
runs/
Expand Down
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,11 @@ pip install -e .\[dev\]

# Install flash-attn after all dependencies are installed
# Note: flash-attn will take a long time to compile, please be patient.
pip install flash-attn -v
# Try the following command if you encounter errors during installation
# for bash
pip install -e .[flash_attn]
# for zsh
pip install -e .\[flash_attn\]
# Try the following command if you encounter errors during flash-attn installation
# pip install flash-attn -v --no-build-isolation
```

Expand Down Expand Up @@ -263,7 +266,7 @@ Then, for command-line users, run the RFT process with the following command:
trinity run --config <config_path>
```

> For example, below is the command for fine-tuning Qwen-2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:
> For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:
> ```shell
> trinity run --config examples/grpo_gsm8k/gsm8k.yaml
> ```
Expand All @@ -276,7 +279,7 @@ For more detailed examples about how to use Trinity-RFT, please refer to the fol
+ [Off-policy mode of RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
+ [Asynchronous mode of RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
+ [Multi-turn tasks](./docs/sphinx_doc/source/tutorial/example_multi_turn.md)
+ [Offline learning by DPO](./docs/sphinx_doc/source/tutorial/example_dpo.md)
+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
+ [Advanced data processing / human-in-the-loop](./docs/sphinx_doc/source/tutorial/example_data_functionalities.md)


Expand Down
3 changes: 2 additions & 1 deletion docs/sphinx_doc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,12 +22,13 @@
"sphinx.ext.napoleon",
"sphinx.ext.autosectionlabel",
"myst_parser",
"sphinx.ext.mathjax",
]
source_suffix = {
".rst": "restructuredtext",
".md": "markdown",
}
myst_enable_extensions = ["colon_fence"]
myst_enable_extensions = ["colon_fence", "amsmath", "dollarmath"]

# Prefix document path to section labels, otherwise autogenerated labels would
# look like 'heading' rather than 'path/to/file:heading'
Expand Down
13 changes: 11 additions & 2 deletions docs/sphinx_doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,16 +14,24 @@ Welcome to Trinity-RFT's documentation!
:maxdepth: 1
:glob:
:hidden:
:caption: Tutorial
:caption: Examples

tutorial/example_reasoning_basic.md
tutorial/example_reasoning_advanced.md
tutorial/example_async_mode.md
tutorial/example_multi_turn.md
tutorial/example_dpo.md
tutorial/example_data_functionalities.md
tutorial/trinity_configs.md

.. toctree::
:maxdepth: 2
:glob:
:hidden:
:caption: Guidelines

tutorial/trinity_programming_guide.md
tutorial/trinity_configs.md
tutorial/example_mix_algo.md

.. toctree::
:maxdepth: 1
Expand All @@ -33,6 +41,7 @@ Welcome to Trinity-RFT's documentation!
build_api/trinity.buffer
build_api/trinity.explorer
build_api/trinity.trainer
build_api/trinity.algorithm
build_api/trinity.manager
build_api/trinity.common
build_api/trinity.utils
22 changes: 10 additions & 12 deletions docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,15 +84,18 @@ e.g., utilizing NCCL (when feasible) for model weight synchronization, sequence

## Getting started


*Note: this project is currently under active development; comments and suggestions are welcome!*

```{note}
Note: This project is currently under active development; comments and suggestions are welcome!
```



### Step 1: preparations


Trinity-RFT requires
Python version >= 3.10,
CUDA version >= 12.4,
and at least 2 GPUs.


Installation from source (recommended):
Expand Down Expand Up @@ -146,11 +149,6 @@ docker build -f scripts/docker/Dockerfile -t trinity-rft:latest .
docker run -it --gpus all --shm-size="64g" --rm -v $PWD:/workspace -v <root_path_of_data_and_checkpoints>:/data trinity-rft:latest
```

Trinity-RFT requires
Python version >= 3.10,
CUDA version >= 12.4,
and at least 2 GPUs.


### Step 2: prepare dataset and model

Expand Down Expand Up @@ -243,15 +241,15 @@ trinity run --config <config_path>



For example, below is the command for fine-tuning Qwen-2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:
For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:

```shell
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
```



More example config files can be found in `examples`.
More example config files can be found in [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/).



Expand All @@ -260,7 +258,7 @@ For more detailed examples about how to use Trinity-RFT, please refer to the fol
+ [Off-policy mode of RFT](tutorial/example_reasoning_advanced.md)
+ [Asynchronous mode of RFT](tutorial/example_async_mode.md)
+ [Multi-turn tasks](tutorial/example_multi_turn.md)
+ [Offline learning by DPO](tutorial/example_dpo.md)
+ [Offline learning by DPO or SFT](tutorial/example_dpo.md)
+ [Advanced data processing / human-in-the-loop](tutorial/example_data_functionalities.md)


Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/tutorial/example_async_mode.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Asynchronous RFT

This example shows how to run RFT in a fully asynchronous mode with the GRPO algorithm, Qwen-2.5-1.5B-Instruct model and GSM8K dataset.
This example shows how to run RFT in a fully asynchronous mode with the GRPO algorithm, Qwen2.5-1.5B-Instruct model and GSM8K dataset.

Trinity-RFT supports an asynchronous mode by running the trainer and explorer in separate processes.

Expand Down
Loading