Skip to content

Commit a0beb72

Browse files
committed
rename online to nccl
rename `offline` to `checkpoint` add `sync_timeout` add `save_interval` in trainer config delete `steps_per_epoch` and `reset_consumed`
1 parent ea5f899 commit a0beb72

File tree

25 files changed

+167
-95
lines changed

25 files changed

+167
-95
lines changed

docs/sphinx_doc/source/tutorial/example_dpo.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,13 @@ Note that the dataset has the keys `prompt`, `chosen` and `rejected`. If not, pa
4040

4141
We use the configurations in [`dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/dpo.yaml) and [`train_dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/train_dpo.yaml) for this experiment. Some important setups are listed in the following:
4242

43-
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `offline`. The value of `sync_iteration_interval` can be set as same of the value of `save_freq`.
43+
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `checkpoint`. The value of `sync_iteration_interval` can be set as same of the value of `save_freq`.
4444

4545
```yaml
4646
# In dpo.yaml
4747
mode: train
4848
synchronizer:
49-
sync_method: 'offline'
49+
sync_method: 'checkpoint'
5050
buffer:
5151
train_dataset:
5252
storage_type: file

docs/sphinx_doc/source/tutorial/example_reasoning_basic.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ We run the experiment in a synchronous mode where the Explorer and Trainer opera
4242
```yaml
4343
mode: both
4444
synchronizer:
45-
sync_method: 'online'
45+
sync_method: 'nccl'
4646
sync_iteration_interval: 2
4747
```
4848

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,12 +164,16 @@ explorer:
164164

165165
```yaml
166166
synchronizer:
167-
sync_method: 'online'
167+
sync_method: 'nccl'
168168
sync_iteration_interval: 10
169+
sync_timeout: 1200
169170
```
170171

171-
- `synchronizer.sync_method`: The synchronization method, Support `online` and `offline`. Default is `online`.
172+
- `synchronizer.sync_method`: The synchronization method between `trainer` and `explorer`.
173+
Support `nccl` and `checkpoint`, `nccl` represents that model weights in `explorer` will be synchronized from `trainer` through `nccl`,
174+
`checkpoint` represents that `explorer` will load the newest checkpoints saved by `trainer` then update its model weights. Default is `nccl`.
172175
- `synchronizer.sync_iteration_interval`: The interval between two synchronizations. Default is `10`. It should be set manually.
176+
- `synchronizer.sync_timeout`: The timeout of the synchronization. Default is `1200`.
173177

174178
## Trainer
175179

examples/dpo_humanlike/dpo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ explorer:
4747
max_pending_requests: 32
4848
max_waiting_steps: 4
4949
synchronizer:
50-
sync_method: 'offline'
50+
sync_method: 'checkpoint'
5151
sync_iteration_interval: 30
5252
trainer:
5353
trainer_type: 'verl'

examples/grpo_alfworld/alfworld.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ explorer:
4343
gpu_memory_utilization: 0.7
4444
enable_chunked_prefil: true
4545
synchronizer:
46-
sync_method: 'online'
46+
sync_method: 'nccl'
4747
sync_iteration_interval: 8
4848
trainer:
4949
trainer_type: 'verl'

examples/grpo_gsm8k/gsm8k.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ explorer:
6161
max_pending_requests: 32
6262
max_waiting_steps: 4
6363
synchronizer:
64-
sync_method: 'online'
64+
sync_method: 'nccl'
6565
sync_iteration_interval: 2
6666
trainer:
6767
trainer_type: 'verl'

examples/grpo_math/math.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ explorer:
4747
max_pending_requests: 32
4848
max_waiting_steps: 4
4949
synchronizer:
50-
sync_method: 'online'
50+
sync_method: 'nccl'
5151
sync_iteration_interval: 2
5252
trainer:
5353
trainer_type: 'verl'

examples/grpo_sciworld/sciworld.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ explorer:
4343
gpu_memory_utilization: 0.7
4444
enable_chunked_prefil: true
4545
synchronizer:
46-
sync_method: 'online'
46+
sync_method: 'nccl'
4747
sync_iteration_interval: 8
4848
trainer:
4949
trainer_type: 'verl'

examples/grpo_webshop/webshop.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ explorer:
4343
gpu_memory_utilization: 0.7
4444
enable_chunked_prefil: true
4545
synchronizer:
46-
sync_method: 'online'
46+
sync_method: 'nccl'
4747
sync_iteration_interval: 8
4848
trainer:
4949
trainer_type: 'verl'

examples/opmd_gsm8k/opmd_gsm8k.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ explorer:
4040
max_pending_requests: 32
4141
max_waiting_steps: 4
4242
synchronizer:
43-
sync_method: 'online'
43+
sync_method: 'nccl'
4444
sync_iteration_interval: 10
4545
trainer:
4646
trainer_type: 'verl'

0 commit comments

Comments
 (0)