You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+35-31Lines changed: 35 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
# Developer Guide
2
2
3
-
This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines.
3
+
This guide introduces how to add new workflows to Trinity-RFT and provides relevant development guidelines.
4
4
5
5
```{note}
6
6
Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
@@ -10,7 +10,7 @@ Trinity-RFT is still under development, and the following interfaces may change.
10
10
11
11
## Creating New Workflows
12
12
13
-
Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow:
13
+
Trinity-RFT allows developers to register new workflows (e.g., for multi-turn interactions or agentic scenarios). Below are the steps to create a new workflow:
14
14
15
15
---
16
16
@@ -19,41 +19,43 @@ Trinity-RFT supports developers in registering new workflows (e.g., multi-round
19
19
Before starting development, it's important to understand several core concepts:
20
20
21
21
22
-
-**Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task`data format may vary significantly depending on the type of task:
23
-
-**Math problems**: `Task` contains the problem description and the standard answer.
24
-
-**Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information.
22
+
-**Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task`varies depending on the task type:
23
+
-**Math problems**: A `Task` contains the problem description and the standard answer.
24
+
-**Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
25
25
26
26
27
-
-**Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows:
27
+
-**Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
28
28
-`MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
29
29
-`WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
30
30
-`CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
31
31
- ...
32
32
33
33
34
-
-**Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
34
+
-**Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`. The internal data format depends on the training algorithm used. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token IDs, action masks (identifying which tokens were generated by the LLM), log probabilities, rewards, etc.
35
35
36
36
---
37
37
38
38
### Step 1: Prepare Task Dataset
39
39
40
-
The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file.
41
-
To deal with the differences in `Task`data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields.
40
+
The task dataset is loaded via the `buffer.explorer_input.taskset`configuration entry in your YAML config file.
41
+
To handle differences in `Task`contents, Trinity-RFT provides a unified `Task` interface containing the following fields.
42
42
43
-
-**`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file.
44
-
-**`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases.
45
-
-**`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields.
46
-
-**`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file.
47
-
-**`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the`temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file.
43
+
-**`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your YAML config file.
44
+
-**`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
45
+
-**`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
46
+
-**`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
47
+
-**`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as`temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
48
48
49
-
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains`question` and `answer` fields representing the problem description and standard answer, respectively. For example:
49
+
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with`question` and `answer` fields representing the problem description and standard answer, respectively. For example:
50
50
51
51
```
52
52
{"question": "1+1=", "answer": "2"}
53
53
{"question": "2+2=", "answer": "4"}
54
54
...
55
55
```
56
56
57
+
Example configuration snippet:
58
+
57
59
```yaml
58
60
# some config
59
61
buffer:
@@ -69,7 +71,7 @@ buffer:
69
71
# some other configs
70
72
```
71
73
72
-
In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow`will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
74
+
In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`). The `MathWorkflow`uses the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
73
75
74
76
75
77
---
@@ -96,19 +98,20 @@ class Workflow(ABC):
96
98
```
97
99
98
100
99
-
#### Initialization Your Workflow
101
+
#### Initializing Your Workflow
102
+
103
+
During initialization, `Workflow` receives the following parameters:
100
104
101
-
When initializing, `Workflow` receives the following parameters:
102
105
-`model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
103
-
-`task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset.
104
-
-`auxiliary_models`(`List` of `openai.OpenAI`):A list of auxiliary models, which will not be trained. All of them are provide as OpenAIcompatible API.
106
+
-`task`({class}`trinity.common.workflows.Task`): A single data item from the task dataset.
107
+
-`auxiliary_models`(`List[openai.OpenAI]`):A list of auxiliary modelsnot involved in training. All are provided via OpenAI-compatible APIs.
105
108
106
109
107
110
```{tip}
108
-
The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
111
+
You can switch to using the OpenAI APIby setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and calling `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
109
112
```
110
113
111
-
In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args`in `Task` to further the initialization.
114
+
Here’s an example of initializing a simple workflow using only `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args`for further customization.
112
115
113
116
```python
114
117
classExampleWorkflow(Workflow):
@@ -122,13 +125,13 @@ class ExampleWorkflow(Workflow):
The `run` method is the core of your workflow. It returns a list of `Experience`.
128
-
Below is a simple example demonstrating how to implement the `run` method for a math workflow.
131
+
Below is a simple implementation for a math workflow.
129
132
130
133
We first call the model to generate multiple response using the provided question and rollout arguments.
131
-
And then we use the `calculate_reward` function to calculate the reward for each response.
134
+
Then we calculate the reward for each response using the `calculate_reward` function.
132
135
Finally, we construct a list of `Experience` with the responses and rewards and return it.
133
136
134
137
@@ -171,9 +174,10 @@ class ExampleWorkflow(Workflow):
171
174
return experiences
172
175
```
173
176
174
-
#### Register Your Workflow
177
+
#### Registering Your Workflow
175
178
176
-
Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
179
+
Register your workflow using the `WORKFLOWS.register_module` decorator.
180
+
Ensure the name does not conflict with existing workflows.
177
181
178
182
```python
179
183
# import some packages
@@ -186,7 +190,7 @@ class ExampleWorkflow(Workflow):
186
190
187
191
#### Avoid Re-initialization
188
192
189
-
For some heavy workflows, the initialization process may be time-consuming.
193
+
For heavy workflows, avoid re-initializing resources every time.
190
194
In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
191
195
192
196
```python
@@ -204,7 +208,7 @@ class ExampleWorkflow(Workflow):
204
208
```
205
209
206
210
207
-
#### Full Code
211
+
#### Full Code Example
208
212
209
213
```python
210
214
@WORKFLOWS.register_module("example_workflow")
@@ -262,7 +266,7 @@ class ExampleWorkflow(Workflow):
262
266
263
267
### Step 3: Use Your Workflow
264
268
265
-
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
269
+
After implementing and registering your workflow, you need to update the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
266
270
267
271
```yaml
268
272
buffer:
@@ -274,7 +278,7 @@ buffer:
274
278
# Other fields
275
279
```
276
280
277
-
Then you can run your workflow in the RFT procesing, through the following command.
281
+
Now you can run your workflow in Trinity-RFT using the command:
0 commit comments