Skip to content

Commit eb581f7

Browse files
committed
polish developer guide
1 parent 3597445 commit eb581f7

File tree

1 file changed

+35
-31
lines changed

1 file changed

+35
-31
lines changed

docs/sphinx_doc/source/tutorial/trinity_programming_guide.md

Lines changed: 35 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Developer Guide
22

3-
This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines.
3+
This guide introduces how to add new workflows to Trinity-RFT and provides relevant development guidelines.
44

55
```{note}
66
Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
@@ -10,7 +10,7 @@ Trinity-RFT is still under development, and the following interfaces may change.
1010

1111
## Creating New Workflows
1212

13-
Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow:
13+
Trinity-RFT allows developers to register new workflows (e.g., for multi-turn interactions or agentic scenarios). Below are the steps to create a new workflow:
1414

1515
---
1616

@@ -19,41 +19,43 @@ Trinity-RFT supports developers in registering new workflows (e.g., multi-round
1919
Before starting development, it's important to understand several core concepts:
2020

2121

22-
- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task:
23-
- **Math problems**: `Task` contains the problem description and the standard answer.
24-
- **Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information.
22+
- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type:
23+
- **Math problems**: A `Task` contains the problem description and the standard answer.
24+
- **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
2525

2626

27-
- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows:
27+
- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
2828
- `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
2929
- `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
3030
- `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
3131
- ...
3232

3333

34-
- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
34+
- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`. The internal data format depends on the training algorithm used. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token IDs, action masks (identifying which tokens were generated by the LLM), log probabilities, rewards, etc.
3535

3636
---
3737

3838
### Step 1: Prepare Task Dataset
3939

40-
The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file.
41-
To deal with the differences in `Task` data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields.
40+
The task dataset is loaded via the `buffer.explorer_input.taskset` configuration entry in your YAML config file.
41+
To handle differences in `Task` contents, Trinity-RFT provides a unified `Task` interface containing the following fields.
4242

43-
- **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file.
44-
- **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases.
45-
- **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields.
46-
- **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file.
47-
- **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the `temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file.
43+
- **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your YAML config file.
44+
- **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
45+
- **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
46+
- **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
47+
- **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
4848

49-
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
49+
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
5050

5151
```
5252
{"question": "1+1=", "answer": "2"}
5353
{"question": "2+2=", "answer": "4"}
5454
...
5555
```
5656

57+
Example configuration snippet:
58+
5759
```yaml
5860
# some config
5961
buffer:
@@ -69,7 +71,7 @@ buffer:
6971
# some other configs
7072
```
7173

72-
In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow` will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
74+
In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`). The `MathWorkflow` uses the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
7375

7476

7577
---
@@ -96,19 +98,20 @@ class Workflow(ABC):
9698
```
9799

98100

99-
#### Initialization Your Workflow
101+
#### Initializing Your Workflow
102+
103+
During initialization, `Workflow` receives the following parameters:
100104

101-
When initializing, `Workflow` receives the following parameters:
102105
- `model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
103-
- `task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset.
104-
- `auxiliary_models`(`List` of `openai.OpenAI`): A list of auxiliary models, which will not be trained. All of them are provide as OpenAI compatible API.
106+
- `task`({class}`trinity.common.workflows.Task`): A single data item from the task dataset.
107+
- `auxiliary_models`(`List[openai.OpenAI]`):A list of auxiliary models not involved in training. All are provided via OpenAI-compatible APIs.
105108

106109

107110
```{tip}
108-
The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
111+
You can switch to using the OpenAI API by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and calling `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
109112
```
110113

111-
In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` in `Task` to further the initialization.
114+
Here’s an example of initializing a simple workflow using only `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` for further customization.
112115

113116
```python
114117
class ExampleWorkflow(Workflow):
@@ -122,13 +125,13 @@ class ExampleWorkflow(Workflow):
122125
# self.openai_client = self.model.get_openai_client()
123126
```
124127

125-
#### Implement the `run` method
128+
#### Implementing the `run` method
126129

127130
The `run` method is the core of your workflow. It returns a list of `Experience`.
128-
Below is a simple example demonstrating how to implement the `run` method for a math workflow.
131+
Below is a simple implementation for a math workflow.
129132

130133
We first call the model to generate multiple response using the provided question and rollout arguments.
131-
And then we use the `calculate_reward` function to calculate the reward for each response.
134+
Then we calculate the reward for each response using the `calculate_reward` function.
132135
Finally, we construct a list of `Experience` with the responses and rewards and return it.
133136

134137

@@ -171,9 +174,10 @@ class ExampleWorkflow(Workflow):
171174
return experiences
172175
```
173176

174-
#### Register Your Workflow
177+
#### Registering Your Workflow
175178

176-
Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
179+
Register your workflow using the `WORKFLOWS.register_module` decorator.
180+
Ensure the name does not conflict with existing workflows.
177181

178182
```python
179183
# import some packages
@@ -186,7 +190,7 @@ class ExampleWorkflow(Workflow):
186190

187191
#### Avoid Re-initialization
188192

189-
For some heavy workflows, the initialization process may be time-consuming.
193+
For heavy workflows, avoid re-initializing resources every time.
190194
In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
191195

192196
```python
@@ -204,7 +208,7 @@ class ExampleWorkflow(Workflow):
204208
```
205209

206210

207-
#### Full Code
211+
#### Full Code Example
208212

209213
```python
210214
@WORKFLOWS.register_module("example_workflow")
@@ -262,7 +266,7 @@ class ExampleWorkflow(Workflow):
262266

263267
### Step 3: Use Your Workflow
264268

265-
After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
269+
After implementing and registering your workflow, you need to update the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
266270

267271
```yaml
268272
buffer:
@@ -274,7 +278,7 @@ buffer:
274278
# Other fields
275279
```
276280

277-
Then you can run your workflow in the RFT procesing, through the following command.
281+
Now you can run your workflow in Trinity-RFT using the command:
278282

279283
```
280284
trinity run --config <your_yaml_file>

0 commit comments

Comments
 (0)