From 8b0e15ae4da895b6b4d8fdfba6bc76691883e313 Mon Sep 17 00:00:00 2001 From: pxc Date: Mon, 26 May 2025 14:48:39 +0800 Subject: [PATCH 1/5] update developer gude --- .../tutorial/trinity_programming_guide.md | 210 +++++++++++++----- 1 file changed, 160 insertions(+), 50 deletions(-) diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md index 58e467b20a..ac0151259f 100644 --- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md +++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md @@ -1,6 +1,6 @@ # Developer Guide -This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines. +This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines. ```{note} Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code. @@ -8,9 +8,9 @@ Trinity-RFT is still under development, and the following interfaces may change. --- -## Creating New Task Types +## Creating New Workflows -Trinity-RFT supports developers in registering new task types (e.g., multi-round interaction scenarios). Below are the steps for creating a new task type. +Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow: --- @@ -18,24 +18,35 @@ Trinity-RFT supports developers in registering new task types (e.g., multi-round Before starting development, it's important to understand several core concepts: -- **Task**: Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task: + +- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task: - **Math problems**: `Task` contains the problem description and the standard answer. - **Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information. -- **Workflow**: Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in `Workflows`: - - `MathWorkflow`: For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards). + +- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows: + - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards). + - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment. - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results. - ... -- **Experience**: The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token_ids, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc. + +- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc. --- ### Step 1: Prepare Task Dataset -Each `Task` contains various parameters needed to initialize the `Workflow`. Due to significant differences in initialization parameters across different `Workflows`, the following example uses a math problem scenario. +The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file. +To deal with the differences in `Task` data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields. -In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. + - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file. + - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases. + - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields. + - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file. + - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the `temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file. + +In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. For example: ``` {"question": "1+1=", "answer": "2"} @@ -43,15 +54,31 @@ In the math problem scenario, the `Task` dataset can be a `jsonl` file, where ea ... ``` +```yaml +# some config +buffer: + explorer_input: + taskset: + default_workflow: "math_workflow" + path: "/PATH/TO/FILE/DIR" + format: + prompt_key: "question" + response_key: "answer" + rollout_args: + temperature: 1.0 + # some other configs +``` + +In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow` will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response. + + --- -### Step 2: Write Workflow +### Step 2: Implement a New Workflow -The core of creating a new task type is writing a new `Workflow`, whose base class interface is as follows: +The `Workflow` base class interface is as follows: ```python -# import some packages - class Workflow(ABC): def __init__( @@ -68,39 +95,47 @@ class Workflow(ABC): """Run the workflow and return a list of Experiences.""" ``` -Developers can register their own `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes. - -```python -# import some packages -from trinity.common.workflows.workflow import WORKFLOWS -@WORKFLOWS.register_module("my_workflow") -class MyWorkflow(Workflow): - pass -``` +#### Initialization Your Workflow -#### Initialization Parameters When initializing, `Workflow` receives the following parameters: -- `model`: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`). -- `task`: An instance of `Task`, which is generated by one line of data from the `Task` dataset. The `raw_task` field contains the `Dict` format source data, which can be used to construct the `Workflow` instance. -The `rollout_args` field contains the parameters for the rollout process, such as `n`, `temperature`, `top_k` and `top_p`. -- `auxiliary_models`: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API. +- `model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`). +- `task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset. +- `auxiliary_models`(`List` of `openai.OpenAI`): A list of auxiliary models, which will not be trained. All of them are provide as OpenAI compatible API. + ```{tip} The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow. ``` -#### Example Code -Below is a simple example demonstrating how to implement a math problem `Workflow`: +In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` in `Task` to further the initialization. ```python -@WORKFLOWS.register_module("example_workflow") class ExampleWorkflow(Workflow): - def __init__(self, model: ModelWrapper, task: Task, **kwargs): - super().__init__(model, **kwargs) + def __init__(self, model: ModelWrapper, task: Task, auxiliary_models: List): + super().__init__(model, task, auxiliary_models) self.question = task.raw_task.get("question") self.answer = task.raw_task.get("answer") + self.rollout_args = task.rollout_args + # Optional: If you want to use OpenAI API in your workflow + # self.openai_client = self.model.get_openai_client() +``` + +#### Implement the `run` method + +The `run` method is the core of your workflow. It returns a list of `Experience`. +Below is a simple example demonstrating how to implement the `run` method for a math workflow. + +We first call the model to generate multiple response using the provided question and rollout arguments. +And then we use the `calculate_reward` function to calculate the reward for each response. +Finally, we construct a list of `Experience` with the responses and rewards and return it. + + +```python +class ExampleWorkflow(Workflow): + + # the __init__ function def calculate_reward(self, response: str, truth: str) -> float: if response == truth: @@ -109,27 +144,48 @@ class ExampleWorkflow(Workflow): return 0.0 def run(self) -> List[Experience]: - response = self.model.chat( + # call the model to generate multiple responses + responses = self.model.chat( [ { "role": "user", "content": f"Question:\n{self.question}", } ], - n=self.task.rollout_args.n, - temperature=self.task.rollout_args.temperature, + n=self.rollout_args.n, + temperature=self.rollout_args.temperature, ) - reward: float = self.calculate_reward(response.response_text, self.answer) - return [ - Experience( - tokens=response.tokens, - prompt_length=response.prompt_length, - reward=reward, - logprobs=response.logprobs, + experiences = [] + for response in responses: + # calulcate reward + reward: float = self.calculate_reward(response.response_text, self.answer) + # construct Experience + experiences.append( + Experience( + tokens=response.tokens, + prompt_length=response.prompt_length, + reward=reward, + logprobs=response.logprobs, + ) ) - ] + return experiences +``` + +#### Register Your Workflow + +Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes. + +```python +# import some packages +from trinity.common.workflows.workflow import WORKFLOWS + +@WORKFLOWS.register_module("example_workflow") +class ExampleWorkflow(Workflow): + pass ``` +#### Avoid Re-initialization + For some heavy workflows, the initialization process may be time-consuming. In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization. @@ -147,25 +203,79 @@ class ExampleWorkflow(Workflow): self.answer = task.raw_task.get("answer") ``` + +#### Full Code + +```python +@WORKFLOWS.register_module("example_workflow") +class ExampleWorkflow(Workflow): + + def __init__(self, model: ModelWrapper, task: Task, auxiliary_models: List): + super().__init__(model, task, auxiliary_models) + self.question = task.raw_task.get("question") + self.answer = task.raw_task.get("answer") + self.rollout_args = task.rollout_args + + def calculate_reward(self, response: str, truth: str) -> float: + if response == truth: + return 1.0 + else: + return 0.0 + + def run(self) -> List[Experience]: + # call the model to generate multiple responses + responses = self.model.chat( + [ + { + "role": "user", + "content": f"Question:\n{self.question}", + } + ], + n=self.rollout_args.n, + temperature=self.rollout_args.temperature, + ) + experiences = [] + for response in responses: + # calulcate reward + reward: float = self.calculate_reward(response.response_text, self.answer) + # construct Experience + experiences.append( + Experience( + tokens=response.tokens, + prompt_length=response.prompt_length, + reward=reward, + logprobs=response.logprobs, + ) + ) + return experiences + + def resettable(self): + return True + + def reset(self, task: Task): + self.question = task.raw_task.get("question") + self.answer = task.raw_task.get("answer") +``` + + --- -### Step 3: Modify Configuration File +### Step 3: Use Your Workflow -After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name. +After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name. ```yaml buffer: # Other fields explorer_input: taskset: - name: example_task - storage_type: file path: /path/to/taskset - # Other fields - default_workflow_type: example_workflow -# Other fields + default_workflow_type: example_workflow + # Other fields ``` +Then you can run your workflow in the RFT procesing, through `trinity run --config `. + --- ## Check Code Style From 35974454a4e2f4ee5c500cfea8afe800765a5815 Mon Sep 17 00:00:00 2001 From: pxc Date: Mon, 26 May 2025 14:52:34 +0800 Subject: [PATCH 2/5] update developer gude --- .../sphinx_doc/source/tutorial/trinity_programming_guide.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md index ac0151259f..a6938651d4 100644 --- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md +++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md @@ -274,7 +274,11 @@ buffer: # Other fields ``` -Then you can run your workflow in the RFT procesing, through `trinity run --config `. +Then you can run your workflow in the RFT procesing, through the following command. + +``` +trinity run --config +``` --- From eb581f7bde9ad00892f69d2394413375f583eb5c Mon Sep 17 00:00:00 2001 From: pxc Date: Mon, 26 May 2025 15:21:01 +0800 Subject: [PATCH 3/5] polish developer guide --- .../tutorial/trinity_programming_guide.md | 66 ++++++++++--------- 1 file changed, 35 insertions(+), 31 deletions(-) diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md index a6938651d4..37b8dd5da0 100644 --- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md +++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md @@ -1,6 +1,6 @@ # Developer Guide -This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines. +This guide introduces how to add new workflows to Trinity-RFT and provides relevant development guidelines. ```{note} Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code. @@ -10,7 +10,7 @@ Trinity-RFT is still under development, and the following interfaces may change. ## Creating New Workflows -Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow: +Trinity-RFT allows developers to register new workflows (e.g., for multi-turn interactions or agentic scenarios). Below are the steps to create a new workflow: --- @@ -19,34 +19,34 @@ Trinity-RFT supports developers in registering new workflows (e.g., multi-round Before starting development, it's important to understand several core concepts: -- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task: - - **Math problems**: `Task` contains the problem description and the standard answer. - - **Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information. +- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type: + - **Math problems**: A `Task` contains the problem description and the standard answer. + - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information. -- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows: +- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows: - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards). - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment. - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results. - ... -- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc. +- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`. The internal data format depends on the training algorithm used. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token IDs, action masks (identifying which tokens were generated by the LLM), log probabilities, rewards, etc. --- ### Step 1: Prepare Task Dataset -The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file. -To deal with the differences in `Task` data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields. +The task dataset is loaded via the `buffer.explorer_input.taskset` configuration entry in your YAML config file. +To handle differences in `Task` contents, Trinity-RFT provides a unified `Task` interface containing the following fields. - - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file. - - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases. - - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields. - - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file. - - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the `temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file. + - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your YAML config file. + - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field. + - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields. + - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`. + - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`. -In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. For example: +In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example: ``` {"question": "1+1=", "answer": "2"} @@ -54,6 +54,8 @@ In the math problem scenario, the `Task` dataset can be a `jsonl` file, where ea ... ``` +Example configuration snippet: + ```yaml # some config buffer: @@ -69,7 +71,7 @@ buffer: # some other configs ``` -In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow` will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response. +In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`). The `MathWorkflow` uses the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response. --- @@ -96,19 +98,20 @@ class Workflow(ABC): ``` -#### Initialization Your Workflow +#### Initializing Your Workflow + +During initialization, `Workflow` receives the following parameters: -When initializing, `Workflow` receives the following parameters: - `model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`). -- `task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset. -- `auxiliary_models`(`List` of `openai.OpenAI`): A list of auxiliary models, which will not be trained. All of them are provide as OpenAI compatible API. +- `task`({class}`trinity.common.workflows.Task`): A single data item from the task dataset. +- `auxiliary_models`(`List[openai.OpenAI]`):A list of auxiliary models not involved in training. All are provided via OpenAI-compatible APIs. ```{tip} -The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow. +You can switch to using the OpenAI API by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and calling `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow. ``` -In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` in `Task` to further the initialization. +Here’s an example of initializing a simple workflow using only `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` for further customization. ```python class ExampleWorkflow(Workflow): @@ -122,13 +125,13 @@ class ExampleWorkflow(Workflow): # self.openai_client = self.model.get_openai_client() ``` -#### Implement the `run` method +#### Implementing the `run` method The `run` method is the core of your workflow. It returns a list of `Experience`. -Below is a simple example demonstrating how to implement the `run` method for a math workflow. +Below is a simple implementation for a math workflow. We first call the model to generate multiple response using the provided question and rollout arguments. -And then we use the `calculate_reward` function to calculate the reward for each response. +Then we calculate the reward for each response using the `calculate_reward` function. Finally, we construct a list of `Experience` with the responses and rewards and return it. @@ -171,9 +174,10 @@ class ExampleWorkflow(Workflow): return experiences ``` -#### Register Your Workflow +#### Registering Your Workflow -Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes. +Register your workflow using the `WORKFLOWS.register_module` decorator. +Ensure the name does not conflict with existing workflows. ```python # import some packages @@ -186,7 +190,7 @@ class ExampleWorkflow(Workflow): #### Avoid Re-initialization -For some heavy workflows, the initialization process may be time-consuming. +For heavy workflows, avoid re-initializing resources every time. In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization. ```python @@ -204,7 +208,7 @@ class ExampleWorkflow(Workflow): ``` -#### Full Code +#### Full Code Example ```python @WORKFLOWS.register_module("example_workflow") @@ -262,7 +266,7 @@ class ExampleWorkflow(Workflow): ### Step 3: Use Your Workflow -After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name. +After implementing and registering your workflow, you need to update the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name. ```yaml buffer: @@ -274,7 +278,7 @@ buffer: # Other fields ``` -Then you can run your workflow in the RFT procesing, through the following command. +Now you can run your workflow in Trinity-RFT using the command: ``` trinity run --config From fe48c80b4ff3b5b70f665b031c2c02523510414f Mon Sep 17 00:00:00 2001 From: pxc Date: Mon, 26 May 2025 15:36:19 +0800 Subject: [PATCH 4/5] fix typo --- docs/sphinx_doc/source/tutorial/trinity_programming_guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md index 37b8dd5da0..bc7be3d159 100644 --- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md +++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md @@ -24,7 +24,7 @@ Before starting development, it's important to understand several core concepts: - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information. -- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows: +- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows: - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards). - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment. - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results. From 0d35b27bf3bad4c422f6fbd011c35965b7c3c8f0 Mon Sep 17 00:00:00 2001 From: pxc Date: Mon, 26 May 2025 15:42:08 +0800 Subject: [PATCH 5/5] fix comments --- .../source/tutorial/trinity_programming_guide.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md index bc7be3d159..2e4daeab0b 100644 --- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md +++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md @@ -20,12 +20,12 @@ Before starting development, it's important to understand several core concepts: - **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type: - - **Math problems**: A `Task` contains the problem description and the standard answer. + - **Math problems**: A `Task` contains the problem description and the golden answer. - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information. - **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows: - - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards). + - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses LLM responses, and calculates scores (rewards). - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment. - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results. - ... @@ -44,7 +44,7 @@ To handle differences in `Task` contents, Trinity-RFT provides a unified `Task` - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field. - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields. - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`. - - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`. + - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. This field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`. In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example: @@ -190,7 +190,7 @@ class ExampleWorkflow(Workflow): #### Avoid Re-initialization -For heavy workflows, avoid re-initializing resources every time. +For heavy workflows, re-initializing every time can incurs extra computational costs. In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization. ```python