Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions docs/sphinx_doc/source/main.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,8 +186,16 @@ You may customize the configurations in [`examples`](https://github.com/modelsco
model:
model_path: $MODEL_PATH/{model_name}

data:
dataset_path: $DATASET_PATH/{dataset_name}
buffer:
explorer_input:
taskset:
name: $TASKSET_NAME
path: $DATASET_PATH/{dataset_name}
format:
prompt_key: 'question'
response_key: 'answer'
default_workflow_type: $WORKFLOW_NAME
default_reward_fn_type: $REWARD_FN_NAME
```

Please refer to [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/) for more details.
Expand Down
11 changes: 6 additions & 5 deletions docs/sphinx_doc/source/tutorial/example_async_mode.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,18 +10,19 @@ In addition, we need to configure the following parameters in both files.
The model weights of the explorer and trainer are synchronized once every `sync_iteration_interval * batch_size` tasks.

```yaml
data:
global_config:
batch_size: <batch_size>
# The same checkpoint path
model:
checkpoint_path: /PATH/TO/CHECKPOINT

# The same data_base path
buffer:
train_dataset:
name: gsm8k_buffer
storage_type: queue
path: 'sqlite:///gsm8k.db'
trainer_input:
experience_buffer:
name: gsm8k_buffer
storage_type: queue
path: 'sqlite:///gsm8k.db'

synchronizer:
sync_method: 'checkpoint'
Expand Down
57 changes: 19 additions & 38 deletions docs/sphinx_doc/source/tutorial/example_data_functionalities.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,52 +31,41 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:

```yaml
data:
data_processor:
# basic info
dataset_path: '/path/to/gsm8k'
dataset_config:
source_data_path: '/path/to/gsm8k'
load_kwargs:
split: 'train' # only need the train split
format_config: # set the field mappings
format: # set the field mappings
prompt_key: 'question'
response_key: 'answer'
# database related. The result dataset will be stored in the database.
db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
# downstream loading related
total_epochs: 1
batch_size: 96
default_workflow_type: 'math_workflow'
```

Here you can set the basic information for the GSM-8K dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training:

+ `dataset_path`: the path to the raw dataset.
+ `dataset_config`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
+ `format_config`: some dataset format config items, which are used to map original data field names to unified ones.
+ `source_data_path`: the path to the raw dataset.
+ `load_kwargs`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
+ `format`: some dataset format config items, which are used to map original data field names to unified ones.
+ `db_url`: the URL of the postgresql database to store the result dataset.
+ `total_epochs`: the total number of epochs to train on this dataset.
+ `batch_size`: the training batch size.
+ `default_workflow_type`: the default exploring workflow type. Please refer to [programming guide](trinity_programming_guide.md) for more details.

In addition, there are several config items related to the data active iterator, which is used to prepare a better dataset. The core part of the data active iterator, Data-Juicer, provides tens of operators to help clean or calculate key information for each sample in the dataset. You can configure this part depending on how familiar you are with Data-Juicer.

#### Not familiar with Data-Juicer
If you are not familiar with Data-Juicer, the data module provides a natural-language-based method to config the data processing recipe. What you need to do is only describe the demands of how you want to prepare for the raw dataset, and an agent will be invoked to arrange the data processing recipe for you. Here is an example:

```yaml
data:
data_processor:
# basic info
dataset_path: '/path/to/gsm8k'
dataset_config:
source_data_path: '/path/to/gsm8k'
load_kwargs:
split: 'train' # only need the train split
format_config: # set the field mappings
format: # set the field mappings
prompt_key: 'question'
response_key: 'answer'
# database related. The result dataset will be stored in the database.
db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
# downstream loading related
total_epochs: 1
batch_size: 96
default_workflow_type: 'math_workflow'

#### new part about data active iterator
dj_process_desc: 'Please compute difficulty scores for these math questions.'
Expand Down Expand Up @@ -109,20 +98,16 @@ process:
After preparing the Data-Juicer data processing recipe, you can set the `dj_config_path` item in the Trinity-RFT config file to the path to this recipe. For example:

```yaml
data:
data_processor:
# basic info
dataset_path: '/path/to/gsm8k'
dataset_config:
source_data_path: '/path/to/gsm8k'
load_kwargs:
split: 'train' # only need the train split
format_config: # set the field mappings
format: # set the field mappings
prompt_key: 'question'
response_key: 'answer'
# database related. The result dataset will be stored in the database.
db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
# downstream loading related
total_epochs: 1
batch_size: 96
default_workflow_type: 'math_workflow'

#### new part about data active iterator
dj_config_path: '/path/to/the/Data-Juicer/data/processing/recipe/above.yaml'
Expand Down Expand Up @@ -185,23 +170,19 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:

```yaml
data:
data_processor:
# basic info
dataset_path: 'tests/test_data/test_human_annotator'
dataset_config:
source_data_path: 'tests/test_data/test_human_annotator'
load_kwargs:
split: 'train' # only need the train split
format_config: # set the field mappings
format: # set the field mappings
prompt_key: 'prompt'
chosen_key: 'chosen'
rejected_key: 'rejected'
#### new part about data active iterator
dj_config_path: 'tests/test_configs/human_annotator_test_dj_cfg.yaml'
# database related. The result dataset will be stored in the database.
db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
# downstream loading related
total_epochs: 20
batch_size: 32
default_workflow_type: 'math_workflow'
```

Here you can set the basic information for the example dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training, which is similar to the example above.
Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/tutorial/example_dpo.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ buffer:
train_dataset:
storage_type: file
path: <$DATASET_PATH/human_like_dpo_dataset>
kwargs:
format:
prompt_type: <prompt_type> # messages/plaintext
prompt_key: <prompt_key>
chosen_key: <chosen_key>
Expand Down
2 changes: 1 addition & 1 deletion docs/sphinx_doc/source/tutorial/example_reasoning_basic.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ buffer:
sft_warmup_dataset:
storage_type: file
path: <$DATASET_PATH/{sft_data}>
kwargs:
format:
prompt_type: <prompt_type> # messages/plaintext/chatpair
prompt_key: <prompt_key>
response_key: <response_key>
Expand Down
108 changes: 66 additions & 42 deletions docs/sphinx_doc/source/tutorial/trinity_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,21 @@

The following is the main config file for Trinity-RFT. Take `countdown.yaml` as an example.

## Global Config

```yaml
mode: both
global_config:
total_epochs: 1
batch_size: 96
eval_interval: 1000
```

- `mode`: The mode of the experiment, chosen from `both`, `train`, `explore` or `bench`. `both` means both trainer and explorer are launched; `train` means only trainer is launched; `explore` means only explorer is launched; `bench` conducts benchmark evaluation. Default is `both`.
- `global_config.total_epochs`: The total number of epochs. It should be checked manually.
- `global_config.batch_size`: The batch size used for training. It should be checked manually.
- `global_config.eval_interval`: The interval steps between two evaluations. Default is `1000`.


## Monitor

Expand All @@ -15,45 +30,32 @@ monitor:
- `monitor.name`: The name of the experiment. It must be set manually.


## Data
## Data Processing

<!-- The `data` configuration specifies the data used for training. It includes the total number of epochs, the batch size, the path to the dataset, the default workflow type, the default reward function type, and the format configuration. -->

```yaml
data:
dataset_path: '/PATH/TO/DATASET'
train_split: 'train'
eval_split: ''
dataset_config:
split: 'train'
format_config:
data_processor:
source_data_path: '/PATH/TO/DATASET'
load_kwargs:
split: 'train' # only need the train split
format:
prompt_key: 'question'
response_key: 'answer'

db_url: ''
max_retry_times: 3
max_retry_interval: 1

total_epochs: 20
batch_size: 96
default_workflow_type: 'math_workflow'
default_reward_fn_type: 'countdown_reward'
# cleaner related
dj_config_path: 'tests/test_configs/active_iterator_test_dj_cfg.yaml'
clean_strategy: 'iterative'
# db related
db_url: 'postgresql://{username}@localhost:5432/{db_name}'
```

- `data.dataset_path`: The path to the dataset.
- `data.train_split`: The split name of the dataset used for training. Default is `train`.
- `data.eval_split`: The split name of the dataset used for eval.
- `data.dataset_config`: The configuration for the dataset. <!-- TODO: may only used in Data-Juicer -->
- `data.format_config`: The configuration for the format of the dataset.
- `data.source_data_path`: The path to the source dataset.
- `data.load_kwargs`: The kwargs used in `datasets.load_dataset`.
- `data.format`: The format of the source dataset. It includes `prompt_key` and `response_key`.
- `data.dj_config_path`: The path to the Data-Juicer configuration.
- `data.clean_strategy`: The cleaning strategy used for `DataCleaner`, which iteratively cleans dataset until targets are met.
- `data.db_url`: The URL of the database.
- `data.max_retry_times`: The maximum number of retries when loading the dataset from database.
- `data.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
- `data.total_epochs`: The total number of epochs to explore the dataset. Default is `1`. It should be set manually.
- `data.batch_size`: The number of `Task` in one training batch. The real batch size used in training is `data.batch_size` * `explorer.repeat_times`. It should be set manually.
- `data.default_workflow_type`: The default workflow type used for training.
- `data.default_reward_fn_type`: The default reward function type used for training.

<!-- TODO explain the dataset_config and format_config -->

## Model

Expand Down Expand Up @@ -93,18 +95,40 @@ cluster:
buffer:
max_retry_times: 3
max_retry_interval: 1
train_dataset:
name: countdown_buffer
storage_type: queue
algorithm_type: ppo
path: 'sqlite:///countdown.db'
sft_warmup_dataset: null
explorer_input:
taskset:
name: countdown
path: 'countdown_dataset/oneshot-split'
split: train
format:
prompt_key: 'question'
response_key: 'answer'
eval_tasksets: []
default_workflow_type: 'math_workflow'
default_reward_fn_type: 'countdown_reward'
trainer_input:
experience_buffer:
name: countdown_buffer
storage_type: queue
path: 'sqlite:///countdown.db'
sft_warmup_dataset: null
```

- `buffer.max_retry_times`: The maximum number of retries when loading the dataset from database.
- `buffer.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
- `buffer.train_dataset`: The configuration of the training dataset.
- `buffer.sft_warmup_dataset`: The configuration of the SFT warmup dataset.
- `buffer.max_retry_times`: The maximum number of retries when loading the data from database.
- `buffer.max_retry_interval`: The maximum interval between retries when loading the data from database.
- `buffer.explorer_input.taskset`: The configuration of the taskset.
- `buffer.explorer_input.taskset.name`: The name of the taskset.
- `buffer.explorer_input.taskset.path`: The path to the taskset.
- `buffer.explorer_input.taskset.split`: The split name of the taskset used for training. Default is `train`.
- `buffer.explorer_input.taskset.format`: The format of the taskset. It includes `prompt_key`, `response_key`, `workflow_key` and `reward_fn_key`.
- `buffer.explorer_input.eval_tasksets`: The configuration of the eval tasksets. It is a list of tasksets which will be used for evaluation. And it is empty by default.
- `buffer.explorer_input.default_workflow_type`: The default workflow type for `taskset` and `eval_tasksets`.
- `buffer.explorer_input.default_reward_fn_type`: The default reward function type for `taskset` and `eval_tasksets`.
- `buffer.trainer_input.experience_buffer`: The configuration of experience_buffer.
- `buffer.trainer_input.experience_buffer.name`: The name of the experience buffer.
- `buffer.trainer_input.experience_buffer.storage_type`: The storage type of the experience buffer. Default is `queue`.
- `buffer.trainer_input.experience_buffer.path`: The sql path to store the experience buffer. It can be empty to indicate not saving to the database.
- `buffer.trainer_input.sft_warmup_dataset`: The configuration of the SFT warmup dataset. The structure of `sft_warmup_dataset` is the similar to `buffer.explorer_input.taskset`.

## Explorer

Expand Down Expand Up @@ -157,7 +181,7 @@ synchronizer:
- `synchronizer.sync_method`: The synchronization method between `trainer` and `explorer`.
Support `nccl` and `checkpoint`, `nccl` represents that model weights in `explorer` will be synchronized from `trainer` through `nccl`,
`checkpoint` represents that `explorer` will load the newest checkpoints saved by `trainer` then update its model weights. Default is `nccl`.
- `synchronizer.sync_interval`: The interval between two synchronizations. Default is `10`. It should be set manually.
- `synchronizer.sync_interval`: The interval steps between two synchronizations. Default is `10`. It should be set manually.
- `synchronizer.sync_timeout`: The timeout of the synchronization. Default is `1200`.

## Trainer
Expand All @@ -176,8 +200,8 @@ trainer:
- `trainer.algorithm_type`: The type of the algorithm, Support `ppo`, `grpo`, `opmd` and `dpo`.
- `trainer.trainer_config_path`: The path to the trainer configuration file. It must be set manually.
- `trainer.sft_warmup_steps`: The number of steps to warm up the model. Default is `0`.
- `trainer.eval_interval`: The interval between two evaluations. Default is `1000`.
- `trainer.save_interval`: The interval between two checkpoints. Default is `100`.
- `trainer.eval_interval`: The interval steps between two evaluations. Default is `1000`.
- `trainer.save_interval`: The interval steps between two checkpoints. Default is `100`.

### veRL Trainer Configuration

Expand Down
13 changes: 9 additions & 4 deletions docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,13 +102,18 @@ class ExampleWorkflow(Workflow):

### Step 3: Modify Configuration File

After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `data` domain to the newly registered `Workflow` name.
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name.

```yaml
data:
# Other fields
default_workflow_type: example_workflow
buffer:
# Other fields
explorer_input:
taskset:
name: taskset_name
path: 'path/to/taskset'
# Other fields
eval_tasksets: []
default_workflow_type: example_workflow
# Other fields
```

Expand Down
Loading