You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+53-21Lines changed: 53 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,7 +2,9 @@
2
2
3
3
This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines.
4
4
5
-
> **Note**: Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
5
+
```{note}
6
+
Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
7
+
```
6
8
7
9
---
8
10
@@ -31,11 +33,11 @@ Before starting development, it's important to understand several core concepts:
31
33
32
34
### Step 1: Prepare Task Dataset
33
35
34
-
Each `Task`is a Python dictionary (`Dict[str, Any]`), containing various parameters needed to initialize the `Workflow`. Due to significant differences in initialization parameters across different `Workflows`, the following example uses a math problem scenario.
36
+
Each `Task`contains various parameters needed to initialize the `Workflow`. Due to significant differences in initialization parameters across different `Workflows`, the following example uses a math problem scenario.
35
37
36
38
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively.
37
39
38
-
```json
40
+
```
39
41
{"question": "1+1=", "answer": "2"}
40
42
{"question": "2+2=", "answer": "4"}
41
43
...
@@ -48,25 +50,45 @@ In the math problem scenario, the `Task` dataset can be a `jsonl` file, where ea
48
50
The core of creating a new task type is writing a new `Workflow`, whose base class interface is as follows:
"""Run the workflow and return a list of Experiences."""
62
69
```
63
70
64
-
Developers can register their own `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflows`.
71
+
Developers can register their own `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
72
+
73
+
```python
74
+
# import some packages
75
+
from trinity.common.workflows.workflow importWORKFLOWS
76
+
77
+
@WORKFLOWS.register_module("my_workflow")
78
+
classMyWorkflow(Workflow):
79
+
pass
80
+
```
65
81
66
82
#### Initialization Parameters
67
83
When initializing, `Workflow` receives the following parameters:
68
-
-`model`: Provides an API call interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
69
-
-`kwargs`: Reads one line of data from the `Task` dataset, allowing developers to initialize internal modules such as Agent and Environment within the `Workflow` based on these parameters.
84
+
-`model`: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
85
+
-`task`: An instance of `Task`, which is generated by one line of data from the `Task` dataset. The `raw_task` field contains the `Dict` format source data, which can be used to construct the `Workflow` instance.
86
+
The `rollout_args` field contains the parameters for the rollout process, such as `n`, `temperature`, `top_k` and `top_p`.
87
+
-`auxiliary_models`: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API.
88
+
89
+
```{tip}
90
+
The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance.
91
+
```
70
92
71
93
#### Example Code
72
94
Below is a simple example demonstrating how to implement a math problem `Workflow`:
@@ -75,10 +97,16 @@ Below is a simple example demonstrating how to implement a math problem `Workflo
0 commit comments