You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,16 +5,16 @@ The following is the main config file for Trinity-RFT. Take `countdown.yaml` as
5
5
## Global Config
6
6
7
7
```yaml
8
-
mode: both
9
8
project: Trinity-RFT
10
9
name: example
11
-
checkpoint_root_dir: /PATH/TO/CHECKPOINT_DIR
10
+
mode: both
11
+
checkpoint_root_dir: /PATH/TO/CHECKPOINT
12
12
```
13
13
14
-
- `mode`: The mode of the experiment, chosen from `both`, `train`, `explore` or `bench`. `both` means both trainer and explorer are launched; `train` means only trainer is launched; `explore` means only explorer is launched; `bench` conducts benchmark evaluation. Default is `both`.
15
14
- `project`: The name of the project.
16
15
- `name`: The name of the experiment.
17
-
- `checkpoint_root_dir`: The root directory of the checkpoint.
16
+
- `mode`: The mode of the experiment, chosen from `both`, `train`, `explore` or `bench`. `both` means both trainer and explorer are launched; `train` means only trainer is launched; `explore` means only explorer is launched; `bench` conducts benchmark evaluation. Default is `both`.
17
+
- `checkpoint_root_dir`: The root directory to save the checkpoints. Sepcifically, the generated checkpoints will be saved in `<checkpoint_root_dir>/<project>/<name>/.
18
18
19
19
## Algorithm
20
20
@@ -24,7 +24,7 @@ algorithm:
24
24
repeat_times: 1
25
25
```
26
26
27
-
- `algorithm.algorithm_type`: The type of the algorithm, Support `ppo`, `grpo`, `opmd` and `dpo`.
27
+
- `algorithm.algorithm_type`: The type of the algorithm. Support `ppo`, `grpo`, `opmd` and `dpo`.
28
28
- `algorithm.repeat_times`: The number of times to repeat each task. Used for GRPO-like algorithm. Default is `1`.
0 commit comments