agentscope-ai
diff --git a/‎docs/sphinx_doc/source/main.md‎
Lines changed: 10 additions & 2 deletions b/‎docs/sphinx_doc/source/main.md‎
Lines changed: 10 additions & 2 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 6 additions & 5 deletions b/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 6 additions & 5 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_data_functionalities.md‎
Lines changed: 19 additions & 38 deletions b/‎docs/sphinx_doc/source/tutorial/example_data_functionalities.md‎
Lines changed: 19 additions & 38 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/example_dpo.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/sphinx_doc/source/tutorial/example_dpo.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/sphinx_doc/source/tutorial/example_reasoning_basic.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/sphinx_doc/source/tutorial/example_reasoning_basic.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/sphinx_doc/source/tutorial/trinity_configs.md‎
Lines changed: 66 additions & 42 deletions b/‎docs/sphinx_doc/source/tutorial/trinity_configs.md‎
Lines changed: 66 additions & 42 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/trinity_programming_guide.md‎
Lines changed: 9 additions & 4 deletions b/‎docs/sphinx_doc/source/tutorial/trinity_programming_guide.md‎
Lines changed: 9 additions & 4 deletions
@@ -186,8 +186,16 @@ You may customize the configurations in [`examples`](https://github.com/modelsco
 model:
   model_path: $MODEL_PATH/{model_name}
 
-data:
-  dataset_path: $DATASET_PATH/{dataset_name}
+buffer:
+  explorer_input:
+    taskset:
+      name: $TASKSET_NAME
+      path: $DATASET_PATH/{dataset_name}
+      format:
+        prompt_key: 'question'
+        response_key: 'answer'
+    default_workflow_type: $WORKFLOW_NAME
+    default_reward_fn_type: $REWARD_FN_NAME
 ```
 
 Please refer to [`examples`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/) for more details.
 
@@ -10,18 +10,19 @@ In addition, we need to configure the following parameters in both files.
 The model weights of the explorer and trainer are synchronized once every `sync_iteration_interval * batch_size` tasks.
 
 ```yaml
-data:
+global_config:
   batch_size: <batch_size>
 # The same checkpoint path
 model:
   checkpoint_path: /PATH/TO/CHECKPOINT
 
 # The same data_base path
 buffer:
-  train_dataset:
-    name: gsm8k_buffer
-    storage_type: queue
-    path: 'sqlite:///gsm8k.db'
+  trainer_input:
+    experience_buffer:
+      name: gsm8k_buffer
+      storage_type: queue
+      path: 'sqlite:///gsm8k.db'
 
 synchronizer:
   sync_method: 'checkpoint'
 
@@ -31,52 +31,41 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
 In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:
 
 ```yaml
-data:
+data_processor:
   # basic info
-  dataset_path: '/path/to/gsm8k'
-  dataset_config:
+  source_data_path: '/path/to/gsm8k'
+  load_kwargs:
     split: 'train'  # only need the train split
-  format_config:  # set the field mappings
+  format:  # set the field mappings
     prompt_key: 'question'
     response_key: 'answer'
   # database related. The result dataset will be stored in the database.
   db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
-  # downstream loading related
-  total_epochs: 1
-  batch_size: 96
-  default_workflow_type: 'math_workflow'
 ```
 
 Here you can set the basic information for the GSM-8K dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training:
 
-+ `dataset_path`: the path to the raw dataset.
-+ `dataset_config`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
-+ `format_config`: some dataset format config items, which are used to map original data field names to unified ones.
++ `source_data_path`: the path to the raw dataset.
++ `load_kwargs`: extra config arguments for loading the raw dataset. Mainly for the `load_dataset` method in HuggingFace `datasets` library.
++ `format`: some dataset format config items, which are used to map original data field names to unified ones.
 + `db_url`: the URL of the postgresql database to store the result dataset.
-+ `total_epochs`: the total number of epochs to train on this dataset.
-+ `batch_size`: the training batch size.
-+ `default_workflow_type`: the default exploring workflow type. Please refer to [programming guide](trinity_programming_guide.md) for more details.
 
 In addition, there are several config items related to the data active iterator, which is used to prepare a better dataset. The core part of the data active iterator, Data-Juicer, provides tens of operators to help clean or calculate key information for each sample in the dataset. You can configure this part depending on how familiar you are with Data-Juicer.
 
 #### Not familiar with Data-Juicer
 If you are not familiar with Data-Juicer, the data module provides a natural-language-based method to config the data processing recipe. What you need to do is only describe the demands of how you want to prepare for the raw dataset, and an agent will be invoked to arrange the data processing recipe for you. Here is an example:
 
 ```yaml
-data:
+data_processor:
   # basic info
-  dataset_path: '/path/to/gsm8k'
-  dataset_config:
+  source_data_path: '/path/to/gsm8k'
+  load_kwargs:
     split: 'train'  # only need the train split
-  format_config:  # set the field mappings
+  format:  # set the field mappings
     prompt_key: 'question'
     response_key: 'answer'
   # database related. The result dataset will be stored in the database.
   db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
-  # downstream loading related
-  total_epochs: 1
-  batch_size: 96
-  default_workflow_type: 'math_workflow'
 
   #### new part about data active iterator
   dj_process_desc: 'Please compute difficulty scores for these math questions.'
@@ -109,20 +98,16 @@ process:
 After preparing the Data-Juicer data processing recipe, you can set the `dj_config_path` item in the Trinity-RFT config file to the path to this recipe. For example:
 
 ```yaml
-data:
+data_processor:
   # basic info
-  dataset_path: '/path/to/gsm8k'
-  dataset_config:
+  source_data_path: '/path/to/gsm8k'
+  load_kwargs:
     split: 'train'  # only need the train split
-  format_config:  # set the field mappings
+  format:  # set the field mappings
     prompt_key: 'question'
     response_key: 'answer'
   # database related. The result dataset will be stored in the database.
   db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
-  # downstream loading related
-  total_epochs: 1
-  batch_size: 96
-  default_workflow_type: 'math_workflow'
 
   #### new part about data active iterator
   dj_config_path: '/path/to/the/Data-Juicer/data/processing/recipe/above.yaml'
@@ -185,23 +170,19 @@ Trinity-RFT uses a unified config file to manage all config items. For the data
 In this example, assume that you need to rank all math questions and corresponding answers by their difficulties. So you can set these config items like the following example:
 
 ```yaml
-data:
+data_processor:
   # basic info
-  dataset_path: 'tests/test_data/test_human_annotator'
-  dataset_config:
+  source_data_path: 'tests/test_data/test_human_annotator'
+  load_kwargs:
     split: 'train'  # only need the train split
-  format_config:  # set the field mappings
+  format:  # set the field mappings
     prompt_key: 'prompt'
     chosen_key: 'chosen'
     rejected_key: 'rejected'
   #### new part about data active iterator
   dj_config_path: 'tests/test_configs/human_annotator_test_dj_cfg.yaml'
   # database related. The result dataset will be stored in the database.
   db_url: 'postgresql://{user_name}@localhost:5432/{db_name}'
-  # downstream loading related
-  total_epochs: 20
-  batch_size: 32
-  default_workflow_type: 'math_workflow'
 ```
 
 Here you can set the basic information for the example dataset, database information that is used to store the result dataset, and some other items about downstream dataset loading for exploring and training, which is similar to the example above.
 
@@ -51,7 +51,7 @@ buffer:
   train_dataset:
     storage_type: file
     path: <$DATASET_PATH/human_like_dpo_dataset>
-    kwargs:
+    format:
       prompt_type: <prompt_type> # messages/plaintext
       prompt_key: <prompt_key>
       chosen_key: <chosen_key>
 
@@ -84,7 +84,7 @@ buffer:
   sft_warmup_dataset:
     storage_type: file
     path: <$DATASET_PATH/{sft_data}>
-    kwargs:
+    format:
       prompt_type: <prompt_type> # messages/plaintext/chatpair
       prompt_key: <prompt_key>
       response_key: <response_key>
 
@@ -2,6 +2,21 @@
 
 The following is the main config file for Trinity-RFT. Take `countdown.yaml` as an example.
 
+## Global Config
+
+```yaml
+mode: both
+global_config:
+  total_epochs: 1
+  batch_size: 96
+  eval_interval: 1000
+```
+
+- `mode`: The mode of the experiment, chosen from `both`, `train`, `explore` or `bench`. `both` means both trainer and explorer are launched; `train` means only trainer is launched; `explore` means only explorer is launched; `bench` conducts benchmark evaluation. Default is `both`.
+- `global_config.total_epochs`: The total number of epochs. It should be checked manually.
+- `global_config.batch_size`: The batch size used for training. It should be checked manually.
+- `global_config.eval_interval`: The interval steps between two evaluations. Default is `1000`.
+
 
 ## Monitor
 
@@ -15,45 +30,32 @@ monitor:
 - `monitor.name`: The name of the experiment. It must be set manually.
 
 
-## Data
+## Data Processing
 
 <!-- The `data` configuration specifies the data used for training. It includes the total number of epochs, the batch size, the path to the dataset, the default workflow type, the default reward function type, and the format configuration. -->
 
 ```yaml
-data:
-  dataset_path: '/PATH/TO/DATASET'
-  train_split: 'train'
-  eval_split: ''
-  dataset_config:
-    split: 'train'
-  format_config:
+data_processor:
+  source_data_path: '/PATH/TO/DATASET'
+  load_kwargs:
+    split: 'train'  # only need the train split
+  format:
     prompt_key: 'question'
     response_key: 'answer'
 
-  db_url: ''
-  max_retry_times: 3
-  max_retry_interval: 1
-
-  total_epochs: 20
-  batch_size: 96
-  default_workflow_type: 'math_workflow'
-  default_reward_fn_type: 'countdown_reward'
+  # cleaner related
+  dj_config_path: 'tests/test_configs/active_iterator_test_dj_cfg.yaml'
+  clean_strategy: 'iterative'
+  # db related
+  db_url: 'postgresql://{username}@localhost:5432/{db_name}'
 ```
 
-- `data.dataset_path`: The path to the dataset.
-- `data.train_split`: The split name of the dataset used for training. Default is `train`.
-- `data.eval_split`: The split name of the dataset used for eval.
-- `data.dataset_config`: The configuration for the dataset. <!-- TODO: may only used in Data-Juicer -->
-- `data.format_config`: The configuration for the format of the dataset.
+- `data.source_data_path`: The path to the source dataset.
+- `data.load_kwargs`: The kwargs used in `datasets.load_dataset`.
+- `data.format`: The format of the source dataset. It includes `prompt_key` and `response_key`.
+- `data.dj_config_path`: The path to the Data-Juicer configuration.
+- `data.clean_strategy`: The cleaning strategy used for `DataCleaner`, which iteratively cleans dataset until targets are met.
 - `data.db_url`: The URL of the database.
-- `data.max_retry_times`: The maximum number of retries when loading the dataset from database.
-- `data.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
-- `data.total_epochs`: The total number of epochs to explore the dataset. Default is `1`. It should be set manually.
-- `data.batch_size`: The number of `Task` in one training batch. The real batch size used in training is `data.batch_size` * `explorer.repeat_times`. It should be set manually.
-- `data.default_workflow_type`: The default workflow type used for training.
-- `data.default_reward_fn_type`: The default reward function type used for training.
-
-<!-- TODO explain the dataset_config and format_config -->
 
 ## Model
 
@@ -93,18 +95,40 @@ cluster:
 buffer:
   max_retry_times: 3
   max_retry_interval: 1
-  train_dataset:
-    name: countdown_buffer
-    storage_type: queue
-    algorithm_type: ppo
-    path: 'sqlite:///countdown.db'
-  sft_warmup_dataset: null
+  explorer_input:
+    taskset:
+      name: countdown
+      path: 'countdown_dataset/oneshot-split'
+      split: train
+      format:
+        prompt_key: 'question'
+        response_key: 'answer'
+    eval_tasksets: []
+    default_workflow_type: 'math_workflow'
+    default_reward_fn_type: 'countdown_reward'
+  trainer_input:
+    experience_buffer:
+      name: countdown_buffer
+      storage_type: queue
+      path: 'sqlite:///countdown.db'
+    sft_warmup_dataset: null
 ```
 
-- `buffer.max_retry_times`: The maximum number of retries when loading the dataset from database.
-- `buffer.max_retry_interval`: The maximum interval between retries when loading the dataset from database.
-- `buffer.train_dataset`: The configuration of the training dataset.
-- `buffer.sft_warmup_dataset`: The configuration of the SFT warmup dataset.
+- `buffer.max_retry_times`: The maximum number of retries when loading the data from database.
+- `buffer.max_retry_interval`: The maximum interval between retries when loading the data from database.
+- `buffer.explorer_input.taskset`: The configuration of the taskset.
+- `buffer.explorer_input.taskset.name`: The name of the taskset.
+- `buffer.explorer_input.taskset.path`: The path to the taskset.
+- `buffer.explorer_input.taskset.split`: The split name of the taskset used for training. Default is `train`.
+- `buffer.explorer_input.taskset.format`: The format of the taskset. It includes `prompt_key`, `response_key`, `workflow_key` and `reward_fn_key`.
+- `buffer.explorer_input.eval_tasksets`: The configuration of the eval tasksets. It is a list of tasksets which will be used for evaluation. And it is empty by default.
+- `buffer.explorer_input.default_workflow_type`: The default workflow type for `taskset` and `eval_tasksets`.
+- `buffer.explorer_input.default_reward_fn_type`: The default reward function type for `taskset` and `eval_tasksets`.
+- `buffer.trainer_input.experience_buffer`: The configuration of experience_buffer.
+- `buffer.trainer_input.experience_buffer.name`: The name of the experience buffer.
+- `buffer.trainer_input.experience_buffer.storage_type`: The storage type of the experience buffer. Default is `queue`.
+- `buffer.trainer_input.experience_buffer.path`: The sql path to store the experience buffer. It can be empty to indicate not saving to the database.
+- `buffer.trainer_input.sft_warmup_dataset`: The configuration of the SFT warmup dataset. The structure of `sft_warmup_dataset` is the similar to `buffer.explorer_input.taskset`.
 
 ## Explorer
 
@@ -157,7 +181,7 @@ synchronizer:
 - `synchronizer.sync_method`: The synchronization method between `trainer` and `explorer`.
 Support `nccl` and `checkpoint`, `nccl` represents that model weights in `explorer` will be synchronized from `trainer` through `nccl`,
 `checkpoint` represents that `explorer` will load the newest checkpoints saved by `trainer` then update its model weights. Default is `nccl`.
-- `synchronizer.sync_interval`: The interval between two synchronizations. Default is `10`. It should be set manually.
+- `synchronizer.sync_interval`: The interval steps between two synchronizations. Default is `10`. It should be set manually.
 - `synchronizer.sync_timeout`: The timeout of the synchronization. Default is `1200`.
 
 ## Trainer
@@ -176,8 +200,8 @@ trainer:
 - `trainer.algorithm_type`: The type of the algorithm, Support `ppo`, `grpo`, `opmd` and `dpo`.
 - `trainer.trainer_config_path`: The path to the trainer configuration file. It must be set manually.
 - `trainer.sft_warmup_steps`: The number of steps to warm up the model. Default is `0`.
-- `trainer.eval_interval`: The interval between two evaluations. Default is `1000`.
-- `trainer.save_interval`: The interval between two checkpoints. Default is `100`.
+- `trainer.eval_interval`: The interval steps between two evaluations. Default is `1000`.
+- `trainer.save_interval`: The interval steps between two checkpoints. Default is `100`.
 
 ### veRL Trainer Configuration
 
 
@@ -102,13 +102,18 @@ class ExampleWorkflow(Workflow):
 
 ### Step 3: Modify Configuration File
 
-After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `data` domain to the newly registered `Workflow` name.
+After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name.
 
 ```yaml
-data:
-  # Other fields
-  default_workflow_type: example_workflow
+buffer:
   # Other fields
+  explorer_input:
+    taskset:
+      name: taskset_name
+      path: 'path/to/taskset'
+        # Other fields
+    eval_tasksets: []
+    default_workflow_type: example_workflow
 # Other fields
 ```