You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/develop_operator.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,9 +6,9 @@
6
6
In Trinity-RFT, the operator module is responsible for processing experience data in the buffer module. It supports existing data processing capabilities from [Data-Juicer](https://github.com/modelscope/data-juicer) naturally, and allows developers to implement their own operators as well.
7
7
By customizing operators, developers can implement various data processing functionalities, such as data augmentation, filtering, and transformation. You can even implement advantages/returns calculation as operators, as shown in {ref}`Algorithms <Algorithms>` section.
8
8
9
-
-**DataJuicerOperator** ({class}`trinity.data.operators.DataJuicerOperator`): The operator that wraps the data processing operators from Data-Juicer. It provides a simple interface for developers to list the Data-Juicer operators they want to use. The full list of Data-Juicer operators can be found [here](https://modelscope.github.io/data-juicer/en/main/docs/Operators.html).
10
-
-**ExperienceOperator** ({class}`trinity.data.operators.ExperienceOperator`): The base class for all operators used in experience data processing. It defines the interface and common functionalities that all operators should have. Each operator processes a batch of experience data and returns the processed data with metrics for logging.
11
-
-**ExperiencePipeline** ({class}`trinity.data.pipelines.ExperiencePipeline`): The experience data processing pipeline that manages a sequence of operators. It takes raw experiences from the `Explorer`, passes them through each operator in the pipeline, and writes the final processed experiences into the input buffer of the `Trainer`.
9
+
-**DataJuicerOperator** ({class}`trinity.buffer.operators.DataJuicerOperator`): The operator that wraps the data processing operators from Data-Juicer. It provides a simple interface for developers to list the Data-Juicer operators they want to use. The full list of Data-Juicer operators can be found [here](https://modelscope.github.io/data-juicer/en/main/docs/Operators.html).
10
+
-**ExperienceOperator** ({class}`trinity.buffer.operators.ExperienceOperator`): The base class for all operators used in experience data processing. It defines the interface and common functionalities that all operators should have. Each operator processes a batch of experience data and returns the processed data with metrics for logging.
11
+
-**ExperiencePipeline** ({class}`trinity.buffer.pipelines.ExperiencePipeline`): The experience data processing pipeline that manages a sequence of operators. It takes raw experiences from the `Explorer`, passes them through each operator in the pipeline, and writes the final processed experiences into the input buffer of the `Trainer`.
12
12
13
13
```{note}
14
14
Except for `ExperiencePipeline`, Trinity-RFT also provides `TaskPipeline` for task data processing.
@@ -56,7 +56,7 @@ class RewardFilter(ExperienceOperator):
56
56
return filtered_exps, metrics
57
57
```
58
58
59
-
After implementation, you need to register this module through {class}`trinity.data.operators.EXPERIENCE_OPERATORS`. Once registered, the module can be configured in the configuration file using the registered name.
59
+
After implementation, you need to register this module through {class}`trinity.buffer.operators.EXPERIENCE_OPERATORS`. Once registered, the module can be configured in the configuration file using the registered name.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_step_wise.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -81,7 +81,7 @@ In general multi-step scenarios, each run may generate various number of experie
81
81
82
82
-`buffer.train_batch_size`: The number of experiences to be sampled from the buffer for training, which can be different from the number of generated experiences in each explore step.
83
83
84
-
-`buffer.trainer_input.use_priority_queue = true`: Using `PriorityQueue` allows the model to use the experiences with higher priority, which prefers newly-generated experiences by default.
84
+
-`buffer.trainer_input.experience_buffer.replay_buffer`: Using `PriorityQueue` allows the model to use the experiences with higher priority, which prefers newly-generated experiences by default.
85
85
86
86
-`synchronizer.sync_style = dynamic_by_explorer`: The explorer determines when to synchronize the model weights with the trainer.
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -273,14 +273,12 @@ The configuration for each task dataset is defined as follows:
273
273
-`name`: Name of the dataset. This name will be used as the Ray actor's name, so it must be unique.
274
274
-`storage_type`: How the dataset is stored. Options: `file`, `queue`, `sql`.
275
275
-`file`: The dataset is stored in `jsonl`/`parquet` files. The data file organization is required to meet the huggingface standard. *We recommand using this storage type for most cases.*
276
-
-`queue`: The dataset is stored in a queue. The queue is a simple FIFO queue that stores the task dataset. *Do not use this storage type for task dataset unless you know what you are doing.*
277
276
-`sql`: The dataset is stored in a SQL database. *This type is unstable and will be optimized in the future versions.*
278
277
-`path`: The path to the task dataset.
279
278
- For `file` storage type, the path points to the directory that contains the task dataset files.
280
-
- For `queue` storage type, the path is optional. You can back up the data in the queue by specifying a sqlite database path here.
281
279
- For `sql` storage type, the path points to the sqlite database file.
282
-
-`subset_name`: The subset name of the task dataset. Default is `None`.
283
-
-`split`: The split of the task dataset. Default is `train`.
280
+
-`subset_name`: The subset name of the task dataset, corresponding to the `name` parameter in huggingface datasets `load_dataset` function. Default is `None`.
281
+
-`split`: The split of the task dataset, corresponding to the `split` parameter in huggingface datasets `load_dataset` function. Default is `train`.
284
282
-`repeat_times`: The number of rollouts generated for a task. If not set, it will be automatically set to `algorithm.repeat_times` for `taskset`, and `1` for `eval_tasksets`.
285
283
-`rollout_args`: The parameters for rollout.
286
284
-`temperature`: The temperature for sampling.
@@ -324,7 +322,7 @@ buffer:
324
322
- For `queue` storage type, this field is optional. You can specify a SQLite database or JSON file path here to back up the queue data.
325
323
- For `file` storage type, the path points to the directory containing the dataset files.
326
324
- For `sql` storage type, the path points to the SQLite database file.
327
-
- `format`: Defines keys for prompts and responses in the dataset.
325
+
- `format`: Mainly for SFT and DPO algorithm datasets, used to format the extracted data.
328
326
- `prompt_type`: Specifies the type of prompts in the dataset. We support `plaintext`, `messages` for now.
329
327
- `plaintext`: The prompt is in string format.
330
328
- `messages`: The prompt is organized as a message list.
@@ -339,8 +337,11 @@ buffer:
339
337
- `enable_concatenated_multi_turn`: Enable concatenated multi-turn SFT data preprocess. Only for `messages` and only take effect with SFT algorithm.
340
338
- `chat_template`: Specifies the chat template in string format. If not provided, use `model.custom_chat_template`.
341
339
- `max_read_timeout`: The maximum waiting time (in seconds) to read new experience data. If exceeded, an incomplete batch will be returned directly. Only take effect when `storage_type` is `queue`. Default is 1800 seconds (30 minutes).
342
-
- `use_priority_queue`: Only take effect when `storage_type` is `queue`. If set to `True`, the queue will be a priority queue, which allows for prioritizing certain experiences over others. Default is `False`.
343
-
- `reuse_cooldown_time`: Only take effect when `storage_type` is `queue` and `use_priority_queue` is `True`. If set, it specifies the cooldown time (in seconds) for reusing experiences. If not specified, the default value is `None`, meaning experiences can not be reused.
340
+
- `replay_buffer`: Only take effect when `storage_type` is `queue`. Used to configure the replay buffer for experience reuse.
341
+
- `enable`: Whether to enable the replay buffer. Default is `false`.
342
+
- `reuse_cooldown_time`: Cooldown time (in seconds) for reusing experiences. If not specified, the default value is `None`, meaning experiences can not be reused.
343
+
- `priority_fn`: Experience priority function used to determine the order of experience reuse. Currently supports `linear_decay` and `linear_decay_use_count_control_randomization`.
344
+
- `priority_fn_args`: A dictionary of arguments passed to the priority function, specific parameters depend on the selected priority function.
344
345
- `auxiliary_buffers`: Optional buffers used for trainer. It is a dictionary where each key is the buffer name and the value is the buffer configuration. Each buffer configuration is similar to the `experience_buffer`.
0 commit comments