You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -200,20 +200,17 @@ For more details about dataset downloading, please refer to [Huggingface](https:
200
200
### Step 3: configurations
201
201
202
202
203
-
You may customize the configurations in `scripts/config/{config_name}.yaml`and `scripts/config/{train_config_name}.yaml`. For example, the model and dataset are specified as:
203
+
You may customize the configurations in [`examples`](examples/). For example, the model and dataset are specified as:
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/main.md
+4-7Lines changed: 4 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -180,20 +180,17 @@ For more details about dataset downloading, please refer to [Huggingface](https:
180
180
### Step 3: configurations
181
181
182
182
183
-
You may customize the configurations in `scripts/config/{config_name}.yaml`and `scripts/config/{train_config_name}.yaml`. For example, the model and dataset are specified as:
183
+
You may customize the configurations in [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/). For example, the model and dataset are specified as:
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_data_functionalities.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -133,7 +133,7 @@ And you can set the `clean_strategy` to 'iterative' to get a better dataset.
133
133
134
134
135
135
136
-
All config items in the `data` section can be found [here](trinity_configs.md). A prepared config file for this example of GSM-8K can be found in [the config file of gsm8k](../../../../scripts/config/gsm8k.yaml).
136
+
All config items in the `data` section can be found [here](trinity_configs.md). A prepared config file for this example of GSM-8K can be found in [the config file of gsm8k](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/gsm8k.yaml).
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_dpo.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -38,12 +38,12 @@ Note that the dataset has the keys `prompt`, `chosen` and `rejected`. If not, pa
38
38
39
39
### Configuration
40
40
41
-
We use the configurations in `scripts/config/dpo.yaml`and `scripts/config/train_dpo.yaml` for this experiment. Some important setups are listed in the following:
41
+
We use the configurations in [`dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/dpo.yaml)and [`train_dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/train_dpo.yaml) for this experiment. Some important setups are listed in the following:
42
42
43
43
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `offline`. The value of `sync_iteration_interval` can be set as same of the value of `save_freq`.
44
44
45
45
```yaml
46
-
#scripts/config/dpo.yaml
46
+
#In dpo.yaml
47
47
mode: train
48
48
synchronizer:
49
49
sync_method: 'offline'
@@ -60,7 +60,7 @@ buffer:
60
60
trainer:
61
61
algorithm_type: dpo
62
62
63
-
#scripts/config/train_dpo.yaml
63
+
#In train_dpo.yaml
64
64
actor_rollout_ref:
65
65
actor:
66
66
alg_type: dpo
@@ -73,5 +73,5 @@ actor_rollout_ref:
73
73
Run RFT process with the following command:
74
74
75
75
```shell
76
-
trinity run --config scripts/config/dpo.yaml
76
+
trinity run --config examples/dpo_humanlike/dpo.yaml
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_multi_turn.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -36,15 +36,15 @@ The task is described as an environment instead of a single prompt.
36
36
37
37
## Step 2: Config preparation and run the experiment
38
38
39
-
You can refer to `example_reasoning_basic` to setup the config and others. The default config files are `scripts/config/alfworld.yaml` and `scripts/config/webshop.yaml`, respectively.
39
+
You can refer to `example_reasoning_basic` to setup the config and others. The default config files are [`alfworld.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_alfworld/alfworld.yaml) and [`webshop.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_webshop/webshop.yaml), respectively.
40
40
You may revise the configurations properly and run the experiment!
41
41
42
42
```bash
43
43
# For ALFworld env
44
-
trinity run --config scripts/config/alfworld.yaml
44
+
trinity run --config examples/grpo_alfworld/alfworld.yaml
45
45
46
46
# For WebShop env
47
-
trinity run --config scripts/config/webshop.yaml
47
+
trinity run --config examples/grpo_webshop/webshop.yaml
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@ The algorithm design and analysis can be found in this [technical report](../../
17
17
18
18
To try out the OPMD algorithm:
19
19
```shell
20
-
trinity run --config scripts/config/gsm8k_opmd.yaml
20
+
trinity run --config examples/opmd_gsm8k/opmd_gsm8k.yaml
21
21
```
22
22
23
23
Note that in this config file, `sync_iteration_interval` is set to 10, i.e., the model weights of explorer and trainer are synchronized only once every 10 training steps, which leads to a challenging off-policy scenario (potentially with abrupt distribution shift during the RFT process).
24
-
Other configurations of particular interest are explained at the beginning of `scripts/config/train_gsm8k_opmd.yaml`.
24
+
Other configurations of particular interest are explained at the beginning of [`train_opmd_gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/opmd_gsm8k/train_opmd_gsm8k.yaml).
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_reasoning_basic.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,15 +48,15 @@ synchronizer:
48
48
49
49
### Use GRPO or PPO Algorithm
50
50
51
-
We use the configurations in `scripts/config/gsm8k.yaml`and `scripts/config/train_gsm8k.yaml` for this experiment. Some important setups are listed in the following:
51
+
We use the configurations in [`gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/gsm8k.yaml) and [`train_gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/train_gsm8k.yaml) for this experiment. Some important setups are listed in the following:
52
52
53
53
54
54
```yaml
55
-
# scripts/config/gsm8k.yaml
55
+
# In gsm8k.yaml
56
56
explorer:
57
57
repeat_times: {number of rollouts for each task}
58
58
59
-
# scripts/config/train_gsm8k.yaml
59
+
# In train_gsm8k.yaml
60
60
actor_rollout_ref:
61
61
actor:
62
62
use_kl_loss: True (fro GRPO) / False (for PPO)
@@ -69,7 +69,7 @@ algorithm:
69
69
70
70
Run the RFT process with the following command:
71
71
```bash
72
-
trinity run --config scripts/config/gsm8k.yaml
72
+
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
73
73
```
74
74
75
75
@@ -79,14 +79,14 @@ trinity run --config scripts/config/gsm8k.yaml
79
79
Before RFT, we may use SFT as a warmup step. We need to set `trainer.sft_warmup_iteration > 0` and prepare the SFT data to `buffer.train_dataset.path=$DATASET_PATH/{sft_data}`.
80
80
81
81
```yaml
82
-
# Properly set the following configs in scripts/config/gsm8k.yaml
82
+
# Properly set the following configs in gsm8k.yaml
0 commit comments