You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_dpo.md
+2-3Lines changed: 2 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,13 +40,13 @@ Note that the dataset has the keys `prompt`, `chosen` and `rejected`. If not, pa
40
40
41
41
We use the configurations in [`dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/dpo.yaml) and [`train_dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/train_dpo.yaml) for this experiment. Some important setups are listed in the following:
42
42
43
-
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `offline`. The value of `sync_iteration_interval` can be set as same of the value of `save_freq`.
43
+
We run the experiment in a train mode, as there is no Explorer. To enable this mode, we config `mode` to `train` and set `sync_method` to `checkpoint`. The value of `sync_iteration_interval` can be set as same of the value of `save_interval`.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+8-20Lines changed: 8 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,17 +15,6 @@ monitor:
15
15
- `monitor.name`: The name of the experiment. It must be set manually.
16
16
17
17
18
-
## Monitor
19
-
20
-
```yaml
21
-
monitor:
22
-
project: "Trinity-RFT-countdown"
23
-
name: "qwen2.5-1.5B-countdown"
24
-
```
25
-
26
-
- `monitor.project`: The project name. It must be set manually.
27
-
- `monitor.name`: The name of the experiment. It must be set manually.
28
-
29
18
## Data
30
19
31
20
<!-- The `data` configuration specifies the data used for training. It includes the total number of epochs, the batch size, the path to the dataset, the default workflow type, the default reward function type, and the format configuration. -->
@@ -131,8 +120,6 @@ explorer:
131
120
enforce_eager: true
132
121
dtype: bfloat16
133
122
temperature: 1.0
134
-
top_p: 1.0
135
-
top_k: -1
136
123
seed: 42
137
124
logprobs: 0
138
125
repeat_times: 5
@@ -150,8 +137,6 @@ explorer:
150
137
- `explorer.enforce_eager`: Whether to enforce eager mode. Default is `True`.
151
138
- `explorer.dtype`: The data type used in vLLM. Default is `bfloat16`.
152
139
- `explorer.temperature`: The temperature used in vLLM. Default is `1.0`.
153
-
- `explorer.top_p`: The top-p used in vLLM. Default is `1.0`.
154
-
- `explorer.top_k`: The top-k used in vLLM. Default is `-1`.
155
140
- `explorer.seed`: The seed used in vLLM. Default is `42`.
156
141
- `explorer.logprobs`: The logprobs used in vLLM. Default is `0`.
157
142
- `explorer.repeat_times`: The number of times to repeat each task, used for GRPO-like algorithms. Default is `5`.
@@ -164,12 +149,16 @@ explorer:
164
149
165
150
```yaml
166
151
synchronizer:
167
-
sync_method: 'online'
152
+
sync_method: 'nccl'
168
153
sync_iteration_interval: 10
154
+
sync_timeout: 1200
169
155
```
170
156
171
-
- `synchronizer.sync_method`: The synchronization method, Support `online` and `offline`. Default is `online`.
157
+
- `synchronizer.sync_method`: The synchronization method between `trainer` and `explorer`.
158
+
Support `nccl` and `checkpoint`, `nccl` represents that model weights in `explorer` will be synchronized from `trainer` through `nccl`,
159
+
`checkpoint`represents that `explorer` will load the newest checkpoints saved by `trainer` then update its model weights. Default is `nccl`.
172
160
- `synchronizer.sync_iteration_interval`: The interval between two synchronizations. Default is `10`. It should be set manually.
161
+
- `synchronizer.sync_timeout`: The timeout of the synchronization. Default is `1200`.
0 commit comments