Skip to content

Commit e4a356b

Browse files
authored
Add examples directory (#14)
1 parent 127a6d4 commit e4a356b

35 files changed

+340
-44
lines changed

README.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -200,20 +200,17 @@ For more details about dataset downloading, please refer to [Huggingface](https:
200200
### Step 3: configurations
201201

202202

203-
You may customize the configurations in `scripts/config/{config_name}.yaml`and `scripts/config/{train_config_name}.yaml`. For example, the model and dataset are specified as:
203+
You may customize the configurations in [`examples`](examples/). For example, the model and dataset are specified as:
204204

205205
```yaml
206206
model:
207207
model_path: $MODEL_PATH/{model_name}
208208

209209
data:
210210
dataset_path: $DATASET_PATH/{dataset_name}
211-
212-
trainer:
213-
trainer_config_path: scripts/config/{train_config_name}.yaml
214211
```
215212
216-
You may use the default configurations located in the directory `scripts/config`. Please refer to `examples` for more details.
213+
Please refer to [`examples`](examples/) for more details.
217214

218215

219216

@@ -252,12 +249,12 @@ trinity run --config <config_path>
252249
For example, below is the command for fine-tuning Qwen-2.5-1B-Instruct on GSM8k dataset using GRPO algorithm:
253250

254251
```shell
255-
trinity run --config scripts/config/gsm8k.yaml
252+
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
256253
```
257254

258255

259256

260-
More example config files can be found in `scripts/config`.
257+
More example config files can be found in `examples`.
261258

262259

263260

docs/sphinx_doc/source/main.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -180,20 +180,17 @@ For more details about dataset downloading, please refer to [Huggingface](https:
180180
### Step 3: configurations
181181

182182

183-
You may customize the configurations in `scripts/config/{config_name}.yaml`and `scripts/config/{train_config_name}.yaml`. For example, the model and dataset are specified as:
183+
You may customize the configurations in [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/). For example, the model and dataset are specified as:
184184

185185
```yaml
186186
model:
187187
model_path: $MODEL_PATH/{model_name}
188188

189189
data:
190190
dataset_path: $DATASET_PATH/{dataset_name}
191-
192-
trainer:
193-
trainer_config_path: scripts/config/{train_config_name}.yaml
194191
```
195192
196-
You may use the default configurations located in the directory `scripts/config`. Please refer to `examples` for more details.
193+
Please refer to [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/) for more details.
197194

198195

199196

@@ -232,12 +229,12 @@ trinity run --config <config_path>
232229
For example, below is the command for fine-tuning Qwen-2.5-1B-Instruct on GSM8k dataset using GRPO algorithm:
233230

234231
```shell
235-
trinity run --config scripts/config/gsm8k.yaml
232+
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
236233
```
237234

238235

239236

240-
More example config files can be found in `scripts/config`.
237+
More example config files can be found in `examples`.
241238

242239

243240

docs/sphinx_doc/source/tutorial/example_data_functionalities.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ And you can set the `clean_strategy` to 'iterative' to get a better dataset.
133133

134134

135135

136-
All config items in the `data` section can be found [here](trinity_configs.md). A prepared config file for this example of GSM-8K can be found in [the config file of gsm8k](../../../../scripts/config/gsm8k.yaml).
136+
All config items in the `data` section can be found [here](trinity_configs.md). A prepared config file for this example of GSM-8K can be found in [the config file of gsm8k](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/gsm8k.yaml).
137137

138138

139139

docs/sphinx_doc/source/tutorial/example_dpo.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,12 @@ Note that the dataset has the keys `prompt`, `chosen` and `rejected`. If not, pa
3838

3939
### Configuration
4040

41-
We use the configurations in `scripts/config/dpo.yaml`and `scripts/config/train_dpo.yaml` for this experiment. Some important setups are listed in the following:
41+
We use the configurations in [`dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/dpo.yaml) and [`train_dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/train_dpo.yaml) for this experiment. Some important setups are listed in the following:
4242

4343
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `offline`. The value of `sync_iteration_interval` can be set as same of the value of `save_freq`.
4444

4545
```yaml
46-
# scripts/config/dpo.yaml
46+
# In dpo.yaml
4747
mode: train
4848
synchronizer:
4949
sync_method: 'offline'
@@ -60,7 +60,7 @@ buffer:
6060
trainer:
6161
algorithm_type: dpo
6262

63-
# scripts/config/train_dpo.yaml
63+
# In train_dpo.yaml
6464
actor_rollout_ref:
6565
actor:
6666
alg_type: dpo
@@ -73,5 +73,5 @@ actor_rollout_ref:
7373
Run RFT process with the following command:
7474
7575
```shell
76-
trinity run --config scripts/config/dpo.yaml
76+
trinity run --config examples/dpo_humanlike/dpo.yaml
7777
```

docs/sphinx_doc/source/tutorial/example_multi_turn.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,15 +36,15 @@ The task is described as an environment instead of a single prompt.
3636

3737
## Step 2: Config preparation and run the experiment
3838

39-
You can refer to `example_reasoning_basic` to setup the config and others. The default config files are `scripts/config/alfworld.yaml` and `scripts/config/webshop.yaml`, respectively.
39+
You can refer to `example_reasoning_basic` to setup the config and others. The default config files are [`alfworld.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_alfworld/alfworld.yaml) and [`webshop.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_webshop/webshop.yaml), respectively.
4040
You may revise the configurations properly and run the experiment!
4141

4242
```bash
4343
# For ALFworld env
44-
trinity run --config scripts/config/alfworld.yaml
44+
trinity run --config examples/grpo_alfworld/alfworld.yaml
4545

4646
# For WebShop env
47-
trinity run --config scripts/config/webshop.yaml
47+
trinity run --config examples/grpo_webshop/webshop.yaml
4848
```
4949

5050
## Advance: How to build your own environment

docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,11 @@ The algorithm design and analysis can be found in this [technical report](../../
1717

1818
To try out the OPMD algorithm:
1919
```shell
20-
trinity run --config scripts/config/gsm8k_opmd.yaml
20+
trinity run --config examples/opmd_gsm8k/opmd_gsm8k.yaml
2121
```
2222

2323
Note that in this config file, `sync_iteration_interval` is set to 10, i.e., the model weights of explorer and trainer are synchronized only once every 10 training steps, which leads to a challenging off-policy scenario (potentially with abrupt distribution shift during the RFT process).
24-
Other configurations of particular interest are explained at the beginning of `scripts/config/train_gsm8k_opmd.yaml`.
24+
Other configurations of particular interest are explained at the beginning of [`train_opmd_gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/opmd_gsm8k/train_opmd_gsm8k.yaml).
2525

2626

2727

docs/sphinx_doc/source/tutorial/example_reasoning_basic.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,15 @@ synchronizer:
4848
4949
### Use GRPO or PPO Algorithm
5050
51-
We use the configurations in `scripts/config/gsm8k.yaml`and `scripts/config/train_gsm8k.yaml` for this experiment. Some important setups are listed in the following:
51+
We use the configurations in [`gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/gsm8k.yaml) and [`train_gsm8k.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/grpo_gsm8k/train_gsm8k.yaml) for this experiment. Some important setups are listed in the following:
5252

5353

5454
```yaml
55-
# scripts/config/gsm8k.yaml
55+
# In gsm8k.yaml
5656
explorer:
5757
repeat_times: {number of rollouts for each task}
5858
59-
# scripts/config/train_gsm8k.yaml
59+
# In train_gsm8k.yaml
6060
actor_rollout_ref:
6161
actor:
6262
use_kl_loss: True (fro GRPO) / False (for PPO)
@@ -69,7 +69,7 @@ algorithm:
6969

7070
Run the RFT process with the following command:
7171
```bash
72-
trinity run --config scripts/config/gsm8k.yaml
72+
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
7373
```
7474

7575

@@ -79,14 +79,14 @@ trinity run --config scripts/config/gsm8k.yaml
7979
Before RFT, we may use SFT as a warmup step. We need to set `trainer.sft_warmup_iteration > 0` and prepare the SFT data to `buffer.train_dataset.path=$DATASET_PATH/{sft_data}`.
8080

8181
```yaml
82-
# Properly set the following configs in scripts/config/gsm8k.yaml
82+
# Properly set the following configs in gsm8k.yaml
8383
buffer:
8484
sft_warmup_dataset:
8585
storage_type: file
8686
algorithm_type: sft
8787
path: <$DATASET_PATH/{sft_data}>
8888
kwargs:
89-
prompt_type: <prompt_type> # messages/plaintext
89+
prompt_type: <prompt_type> # messages/plaintext/chatpair
9090
prompt_key: <prompt_key>
9191
response_key: <response_key>
9292
trainer:
@@ -95,5 +95,5 @@ trainer:
9595

9696
The following command runs SFT and RFT in sequence:
9797
```bash
98-
trinity run --config scripts/config/gsm8k.yaml
98+
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
9999
```

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Trinity-RFT Configuration
22

3-
The following is the main config file for Trinity-RFT. Take `scripts/config/countdown.yaml` as an example.
3+
The following is the main config file for Trinity-RFT. Take `countdown.yaml` as an example.
44

55

66
## Monitor
@@ -165,7 +165,7 @@ synchronizer:
165165
trainer:
166166
trainer_type: 'verl'
167167
algorithm_type: ppo
168-
trainer_config_path: 'scripts/config/train_countdown.yaml'
168+
trainer_config_path: 'examples/ppo_countdown/train_countdown.yaml'
169169
sft_warmup_iteration: 0
170170
eval_interval: 1000
171171
```

examples/dpo_humanlike/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# DPO on HumanLike Dataset
2+
3+
This example shows the usage of DPO on the HumanLike dataset.
4+
5+
For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_dpo.md).
6+
7+
The config files are located in [`dpo.yaml`](dpo.yaml) and [`train_dpo.yaml`](train_dpo.yaml).
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ synchronizer:
5353
trainer:
5454
trainer_type: 'verl'
5555
algorithm_type: dpo
56-
trainer_config_path: 'scripts/config/train_dpo.yaml'
56+
trainer_config_path: 'examples/dpo_humanlike/train_dpo.yaml'
5757
monitor:
5858
cache_root_dir: ""
5959
project: "dpo_example"

0 commit comments

Comments
 (0)