You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_data_functionalities.md
+19-38Lines changed: 19 additions & 38 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -31,52 +31,41 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
31
31
In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:
32
32
33
33
```yaml
34
-
data:
34
+
data_processor:
35
35
# basic info
36
-
dataset_path: '/path/to/gsm8k'
37
-
dataset_config:
36
+
source_data_path: '/path/to/gsm8k'
37
+
load_kwargs:
38
38
split: 'train'# only need the train split
39
-
format_config: # set the field mappings
39
+
format: # set the field mappings
40
40
prompt_key: 'question'
41
41
response_key: 'answer'
42
42
# database related. The result dataset will be stored in the database.
Here you can set the basic information for the GSM-8K dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training:
51
47
52
-
+ `dataset_path`: the path to the raw dataset.
53
-
+ `dataset_config`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
54
-
+ `format_config`: some dataset format config items, which are used to map original data field names to unified ones.
48
+
+ `source_data_path`: the path to the raw dataset.
49
+
+ `load_kwargs`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
50
+
+ `format`: some dataset format config items, which are used to map original data field names to unified ones.
55
51
+ `db_url`: the URL of the postgresql database to store the result dataset.
56
-
+ `total_epochs`: the total number of epochs to train on this dataset.
57
-
+ `batch_size`: the training batch size.
58
-
+ `default_workflow_type`: the default exploring workflow type. Please refer to [programming guide](trinity_programming_guide.md) for more details.
59
52
60
53
In addition, there are several config items related to the data active iterator, which is used to prepare a better dataset. The core part of the data active iterator, Data-Juicer, provides tens of operators to help clean or calculate key information for each sample in the dataset. You can configure this part depending on how familiar you are with Data-Juicer.
61
54
62
55
#### Not familiar with Data-Juicer
63
56
If you are not familiar with Data-Juicer, the data module provides a natural-language-based method to config the data processing recipe. What you need to do is only describe the demands of how you want to prepare for the raw dataset, and an agent will be invoked to arrange the data processing recipe for you. Here is an example:
64
57
65
58
```yaml
66
-
data:
59
+
data_processor:
67
60
# basic info
68
-
dataset_path: '/path/to/gsm8k'
69
-
dataset_config:
61
+
source_data_path: '/path/to/gsm8k'
62
+
load_kwargs:
70
63
split: 'train' # only need the train split
71
-
format_config: # set the field mappings
64
+
format: # set the field mappings
72
65
prompt_key: 'question'
73
66
response_key: 'answer'
74
67
# database related. The result dataset will be stored in the database.
dj_process_desc: 'Please compute difficulty scores for these math questions.'
@@ -109,20 +98,16 @@ process:
109
98
After preparing the Data-Juicer data processing recipe, you can set the `dj_config_path` item in the Trinity-RFT config file to the path to this recipe. For example:
110
99
111
100
```yaml
112
-
data:
101
+
data_processor:
113
102
# basic info
114
-
dataset_path: '/path/to/gsm8k'
115
-
dataset_config:
103
+
source_data_path: '/path/to/gsm8k'
104
+
load_kwargs:
116
105
split: 'train' # only need the train split
117
-
format_config: # set the field mappings
106
+
format: # set the field mappings
118
107
prompt_key: 'question'
119
108
response_key: 'answer'
120
109
# database related. The result dataset will be stored in the database.
@@ -185,23 +170,19 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
185
170
In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:
Here you can set the basic information for the example dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training, which is similar to the example above.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+66-42Lines changed: 66 additions & 42 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,21 @@
2
2
3
3
The following is the main config file for Trinity-RFT. Take `countdown.yaml` as an example.
4
4
5
+
## Global Config
6
+
7
+
```yaml
8
+
mode: both
9
+
global_config:
10
+
total_epochs: 1
11
+
batch_size: 96
12
+
eval_interval: 1000
13
+
```
14
+
15
+
- `mode`: The mode of the experiment, chosen from `both`, `train`, `explore` or `bench`. `both` means both trainer and explorer are launched; `train` means only trainer is launched; `explore` means only explorer is launched; `bench` conducts benchmark evaluation. Default is `both`.
16
+
- `global_config.total_epochs`: The total number of epochs. It should be checked manually.
17
+
- `global_config.batch_size`: The batch size used for training. It should be checked manually.
18
+
- `global_config.eval_interval`: The interval steps between two evaluations. Default is `1000`.
19
+
5
20
6
21
## Monitor
7
22
@@ -15,45 +30,32 @@ monitor:
15
30
- `monitor.name`: The name of the experiment. It must be set manually.
16
31
17
32
18
-
## Data
33
+
## Data Processing
19
34
20
35
<!-- The `data` configuration specifies the data used for training. It includes the total number of epochs, the batch size, the path to the dataset, the default workflow type, the default reward function type, and the format configuration. -->
- `data.train_split`: The split name of the dataset used for training. Default is `train`.
45
-
- `data.eval_split`: The split name of the dataset used for eval.
46
-
- `data.dataset_config`: The configuration for the dataset. <!-- TODO: may only used in Data-Juicer -->
47
-
- `data.format_config`: The configuration for the format of the dataset.
53
+
- `data.source_data_path`: The path to the source dataset.
54
+
- `data.load_kwargs`: The kwargs used in `datasets.load_dataset`.
55
+
- `data.format`: The format of the source dataset. It includes `prompt_key` and `response_key`.
56
+
- `data.dj_config_path`: The path to the Data-Juicer configuration.
57
+
- `data.clean_strategy`: The cleaning strategy used for `DataCleaner`, which iteratively cleans dataset until targets are met.
48
58
- `data.db_url`: The URL of the database.
49
-
- `data.max_retry_times`: The maximum number of retries when loading the dataset from database.
50
-
- `data.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
51
-
- `data.total_epochs`: The total number of epochs to explore the dataset. Default is `1`. It should be set manually.
52
-
- `data.batch_size`: The number of `Task` in one training batch. The real batch size used in training is `data.batch_size` * `explorer.repeat_times`. It should be set manually.
53
-
- `data.default_workflow_type`: The default workflow type used for training.
54
-
- `data.default_reward_fn_type`: The default reward function type used for training.
55
-
56
-
<!-- TODO explain the dataset_config and format_config -->
57
59
58
60
## Model
59
61
@@ -93,18 +95,40 @@ cluster:
93
95
buffer:
94
96
max_retry_times: 3
95
97
max_retry_interval: 1
96
-
train_dataset:
97
-
name: countdown_buffer
98
-
storage_type: queue
99
-
algorithm_type: ppo
100
-
path: 'sqlite:///countdown.db'
101
-
sft_warmup_dataset: null
98
+
explorer_input:
99
+
taskset:
100
+
name: countdown
101
+
path: 'countdown_dataset/oneshot-split'
102
+
split: train
103
+
format:
104
+
prompt_key: 'question'
105
+
response_key: 'answer'
106
+
eval_tasksets: []
107
+
default_workflow_type: 'math_workflow'
108
+
default_reward_fn_type: 'countdown_reward'
109
+
trainer_input:
110
+
experience_buffer:
111
+
name: countdown_buffer
112
+
storage_type: queue
113
+
path: 'sqlite:///countdown.db'
114
+
sft_warmup_dataset: null
102
115
```
103
116
104
-
- `buffer.max_retry_times`: The maximum number of retries when loading the dataset from database.
105
-
- `buffer.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
106
-
- `buffer.train_dataset`: The configuration of the training dataset.
107
-
- `buffer.sft_warmup_dataset`: The configuration of the SFT warmup dataset.
117
+
- `buffer.max_retry_times`: The maximum number of retries when loading the data from database.
118
+
- `buffer.max_retry_interval`: The maximum interval between retries when loading the data from database.
119
+
- `buffer.explorer_input.taskset`: The configuration of the taskset.
120
+
- `buffer.explorer_input.taskset.name`: The name of the taskset.
121
+
- `buffer.explorer_input.taskset.path`: The path to the taskset.
122
+
- `buffer.explorer_input.taskset.split`: The split name of the taskset used for training. Default is `train`.
123
+
- `buffer.explorer_input.taskset.format`: The format of the taskset. It includes `prompt_key`, `response_key`, `workflow_key` and `reward_fn_key`.
124
+
- `buffer.explorer_input.eval_tasksets`: The configuration of the eval tasksets. It is a list of tasksets which will be used for evaluation. And it is empty by default.
125
+
- `buffer.explorer_input.default_workflow_type`: The default workflow type for `taskset` and `eval_tasksets`.
126
+
- `buffer.explorer_input.default_reward_fn_type`: The default reward function type for `taskset` and `eval_tasksets`.
127
+
- `buffer.trainer_input.experience_buffer`: The configuration of experience_buffer.
128
+
- `buffer.trainer_input.experience_buffer.name`: The name of the experience buffer.
129
+
- `buffer.trainer_input.experience_buffer.storage_type`: The storage type of the experience buffer. Default is `queue`.
130
+
- `buffer.trainer_input.experience_buffer.path`: The sql path to store the experience buffer. It can be empty to indicate not saving to the database.
131
+
- `buffer.trainer_input.sft_warmup_dataset`: The configuration of the SFT warmup dataset. The structure of `sft_warmup_dataset` is the similar to `buffer.explorer_input.taskset`.
108
132
109
133
## Explorer
110
134
@@ -157,7 +181,7 @@ synchronizer:
157
181
- `synchronizer.sync_method`: The synchronization method between `trainer` and `explorer`.
158
182
Support `nccl` and `checkpoint`, `nccl` represents that model weights in `explorer` will be synchronized from `trainer` through `nccl`,
159
183
`checkpoint`represents that `explorer` will load the newest checkpoints saved by `trainer` then update its model weights. Default is `nccl`.
160
-
- `synchronizer.sync_interval`: The interval between two synchronizations. Default is `10`. It should be set manually.
184
+
- `synchronizer.sync_interval`: The interval steps between two synchronizations. Default is `10`. It should be set manually.
161
185
- `synchronizer.sync_timeout`: The timeout of the synchronization. Default is `1200`.
162
186
163
187
## Trainer
@@ -176,8 +200,8 @@ trainer:
176
200
- `trainer.algorithm_type`: The type of the algorithm, Support `ppo`, `grpo`, `opmd` and `dpo`.
177
201
- `trainer.trainer_config_path`: The path to the trainer configuration file. It must be set manually.
178
202
- `trainer.sft_warmup_steps`: The number of steps to warm up the model. Default is `0`.
179
-
- `trainer.eval_interval`: The interval between two evaluations. Default is `1000`.
180
-
- `trainer.save_interval`: The interval between two checkpoints. Default is `100`.
203
+
- `trainer.eval_interval`: The interval steps between two evaluations. Default is `1000`.
204
+
- `trainer.save_interval`: The interval steps between two checkpoints. Default is `100`.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,13 +102,18 @@ class ExampleWorkflow(Workflow):
102
102
103
103
### Step 3: Modify Configuration File
104
104
105
-
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `data` domain to the newly registered `Workflow` name.
105
+
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name.
0 commit comments