You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,12 +20,12 @@ Before starting development, it's important to understand several core concepts:
20
20
21
21
22
22
-**Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type:
23
-
-**Math problems**: A `Task` contains the problem description and the standard answer.
23
+
-**Math problems**: A `Task` contains the problem description and the golden answer.
24
24
-**Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
25
25
26
26
27
27
-**Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
28
-
-`MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
28
+
-`MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses LLM responses, and calculates scores (rewards).
29
29
-`WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
30
30
-`CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
31
31
- ...
@@ -44,7 +44,7 @@ To handle differences in `Task` contents, Trinity-RFT provides a unified `Task`
44
44
-**`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
45
45
-**`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
46
46
-**`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
47
-
-**`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
47
+
-**`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. This field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
48
48
49
49
In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
50
50
@@ -190,7 +190,7 @@ class ExampleWorkflow(Workflow):
190
190
191
191
#### Avoid Re-initialization
192
192
193
-
For heavy workflows, avoid re-initializing resources every time.
193
+
For heavy workflows, re-initializing every time can incurs extra computational costs.
194
194
In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
0 commit comments