You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines.
3
+
This guide introduces how to add new workflows to Trinity-RFT and provides relevant development guidelines.
4
4
5
5
```{note}
6
6
Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
7
7
```
8
8
9
9
---
10
10
11
-
## Creating New Task Types
11
+
## Creating New Workflows
12
12
13
-
Trinity-RFT supports developers in registering new task types (e.g., multi-round interaction scenarios). Below are the steps for creating a new task type.
13
+
Trinity-RFT allows developers to register new workflows (e.g., for multi-turn interactions or agentic scenarios). Below are the steps to create a new workflow:
14
14
15
15
---
16
16
17
17
### Step 0: Basic Concepts
18
18
19
19
Before starting development, it's important to understand several core concepts:
20
20
21
-
-**Task**: Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task:
22
-
-**Math problems**: `Task` contains the problem description and the standard answer.
23
-
-**Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information.
24
21
25
-
-**Workflow**: Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in `Workflows`:
26
-
-`MathWorkflow`: For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
22
+
-**Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type:
23
+
-**Math problems**: A `Task` contains the problem description and the golden answer.
24
+
-**Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
25
+
26
+
27
+
-**Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
28
+
-`MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses LLM responses, and calculates scores (rewards).
29
+
-`WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
27
30
-`CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
28
31
- ...
29
32
30
-
-**Experience**: The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token_ids, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
33
+
34
+
-**Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`. The internal data format depends on the training algorithm used. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token IDs, action masks (identifying which tokens were generated by the LLM), log probabilities, rewards, etc.
31
35
32
36
---
33
37
34
38
### Step 1: Prepare Task Dataset
35
39
36
-
Each `Task` contains various parameters needed to initialize the `Workflow`. Due to significant differences in initialization parameters across different `Workflows`, the following example uses a math problem scenario.
40
+
The task dataset is loaded via the `buffer.explorer_input.taskset` configuration entry in your YAML config file.
41
+
To handle differences in `Task` contents, Trinity-RFT provides a unified `Task` interface containing the following fields.
37
42
38
-
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively.
43
+
-**`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your YAML config file.
44
+
-**`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
45
+
-**`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
46
+
-**`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
47
+
-**`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. This field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
48
+
49
+
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
39
50
40
51
```
41
52
{"question": "1+1=", "answer": "2"}
42
53
{"question": "2+2=", "answer": "4"}
43
54
...
44
55
```
45
56
57
+
Example configuration snippet:
58
+
59
+
```yaml
60
+
# some config
61
+
buffer:
62
+
explorer_input:
63
+
taskset:
64
+
default_workflow: "math_workflow"
65
+
path: "/PATH/TO/FILE/DIR"
66
+
format:
67
+
prompt_key: "question"
68
+
response_key: "answer"
69
+
rollout_args:
70
+
temperature: 1.0
71
+
# some other configs
72
+
```
73
+
74
+
In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`). The `MathWorkflow` uses the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
75
+
76
+
46
77
---
47
78
48
-
### Step 2: Write Workflow
79
+
### Step 2: Implement a New Workflow
49
80
50
-
The core of creating a new task type is writing a new `Workflow`, whose base class interface is as follows:
81
+
The `Workflow` base class interface is as follows:
51
82
52
83
```python
53
-
# import some packages
54
-
55
84
classWorkflow(ABC):
56
85
57
86
def__init__(
@@ -68,39 +97,48 @@ class Workflow(ABC):
68
97
"""Run the workflow and return a list of Experiences."""
69
98
```
70
99
71
-
Developers can register their own `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
72
100
73
-
```python
74
-
# import some packages
75
-
from trinity.common.workflows.workflow importWORKFLOWS
101
+
#### Initializing Your Workflow
76
102
77
-
@WORKFLOWS.register_module("my_workflow")
78
-
classMyWorkflow(Workflow):
79
-
pass
80
-
```
103
+
During initialization, `Workflow` receives the following parameters:
104
+
105
+
-`model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
106
+
-`task`({class}`trinity.common.workflows.Task`): A single data item from the task dataset.
107
+
-`auxiliary_models`(`List[openai.OpenAI]`):A list of auxiliary models not involved in training. All are provided via OpenAI-compatible APIs.
81
108
82
-
#### Initialization Parameters
83
-
When initializing, `Workflow` receives the following parameters:
84
-
-`model`: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
85
-
-`task`: An instance of `Task`, which is generated by one line of data from the `Task` dataset. The `raw_task` field contains the `Dict` format source data, which can be used to construct the `Workflow` instance.
86
-
The `rollout_args` field contains the parameters for the rollout process, such as `n`, `temperature`, `top_k` and `top_p`.
87
-
-`auxiliary_models`: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API.
88
109
89
110
```{tip}
90
-
The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
111
+
You can switch to using the OpenAI APIby setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and calling `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
91
112
```
92
113
93
-
#### Example Code
94
-
Below is a simple example demonstrating how to implement a math problem `Workflow`:
114
+
Here’s an example of initializing a simple workflow using only `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` for further customization.
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name.
269
+
After implementing and registering your workflow, you need to update the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
155
270
156
271
```yaml
157
272
buffer:
158
273
# Other fields
159
274
explorer_input:
160
275
taskset:
161
-
name: example_task
162
-
storage_type: file
163
276
path: /path/to/taskset
164
-
# Other fields
165
-
default_workflow_type: example_workflow
166
-
# Other fields
277
+
default_workflow_type: example_workflow
278
+
# Other fields
279
+
```
280
+
281
+
Now you can run your workflow in Trinity-RFT using the command:
0 commit comments