From 8b0e15ae4da895b6b4d8fdfba6bc76691883e313 Mon Sep 17 00:00:00 2001
From: pxc <panxuchen.pxc@alibaba-inc.com>
Date: Mon, 26 May 2025 14:48:39 +0800
Subject: [PATCH 1/5] update developer gude

---
 .../tutorial/trinity_programming_guide.md     | 210 +++++++++++++-----
 1 file changed, 160 insertions(+), 50 deletions(-)

diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
index 58e467b20a..ac0151259f 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
@@ -1,6 +1,6 @@
 # Developer Guide
 
-This guide will introduce how to add new task types to Trinity-RFT and provide relevant development guidelines.
+This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines.
 
 ```{note}
 Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
@@ -8,9 +8,9 @@ Trinity-RFT is still under development, and the following interfaces may change.
 
 ---
 
-## Creating New Task Types
+## Creating New Workflows
 
-Trinity-RFT supports developers in registering new task types (e.g., multi-round interaction scenarios). Below are the steps for creating a new task type.
+Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow:
 
 ---
 
@@ -18,24 +18,35 @@ Trinity-RFT supports developers in registering new task types (e.g., multi-round
 
 Before starting development, it's important to understand several core concepts:
 
-- **Task**: Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task:
+
+- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task:
   - **Math problems**: `Task` contains the problem description and the standard answer.
   - **Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information.
 
-- **Workflow**: Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in `Workflows`:
-  - `MathWorkflow`: For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
+
+- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows:
+  - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
+  - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
   - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
   - ...
 
-- **Experience**: The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token_ids, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
+
+- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
 
 ---
 
 ### Step 1: Prepare Task Dataset
 
-Each `Task` contains various parameters needed to initialize the `Workflow`. Due to significant differences in initialization parameters across different `Workflows`, the following example uses a math problem scenario.
+The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file.
+To deal with the differences in `Task` data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields.
 
-In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively.
+  - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file.
+  - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases.
+  - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields.
+  - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file.
+  - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the `temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file.
+
+In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
 
 ```
 {"question": "1+1=", "answer": "2"}
@@ -43,15 +54,31 @@ In the math problem scenario, the `Task` dataset can be a `jsonl` file, where ea
 ...
 ```
 
+```yaml
+# some config
+buffer:
+  explorer_input:
+    taskset:
+      default_workflow: "math_workflow"
+      path: "/PATH/TO/FILE/DIR"
+      format:
+        prompt_key: "question"
+        response_key: "answer"
+      rollout_args:
+        temperature: 1.0
+      # some other configs
+```
+
+In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow` will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
+
+
 ---
 
-### Step 2: Write Workflow
+### Step 2: Implement a New Workflow
 
-The core of creating a new task type is writing a new `Workflow`, whose base class interface is as follows:
+The `Workflow` base class interface is as follows:
 
 ```python
-# import some packages
-
 class Workflow(ABC):
 
     def __init__(
@@ -68,39 +95,47 @@ class Workflow(ABC):
         """Run the workflow and return a list of Experiences."""
 ```
 
-Developers can register their own `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
-
-```python
-# import some packages
-from trinity.common.workflows.workflow import WORKFLOWS
 
-@WORKFLOWS.register_module("my_workflow")
-class MyWorkflow(Workflow):
-    pass
-```
+#### Initialization Your Workflow
 
-#### Initialization Parameters
 When initializing, `Workflow` receives the following parameters:
-- `model`: The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
-- `task`: An instance of `Task`, which is generated by one line of data from the `Task` dataset. The `raw_task` field contains the `Dict` format source data, which can be used to construct the `Workflow` instance.
-The `rollout_args` field contains the parameters for the rollout process, such as `n`, `temperature`, `top_k` and `top_p`.
-- `auxiliary_models`: A list of auxiliary models, which will not be trained. All of them provide OpenAI compatible API.
+- `model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
+- `task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset.
+- `auxiliary_models`(`List` of `openai.OpenAI`): A list of auxiliary models, which will not be trained. All of them are provide as OpenAI compatible API.
+
 
 ```{tip}
 The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
 ```
 
-#### Example Code
-Below is a simple example demonstrating how to implement a math problem `Workflow`:
+In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` in `Task` to further the initialization.
 
 ```python
-@WORKFLOWS.register_module("example_workflow")
 class ExampleWorkflow(Workflow):
 
-    def __init__(self, model: ModelWrapper, task: Task, **kwargs):
-        super().__init__(model, **kwargs)
+    def __init__(self, model: ModelWrapper, task: Task, auxiliary_models: List):
+        super().__init__(model, task, auxiliary_models)
         self.question = task.raw_task.get("question")
         self.answer = task.raw_task.get("answer")
+        self.rollout_args = task.rollout_args
+        # Optional: If you want to use OpenAI API in your workflow
+        # self.openai_client = self.model.get_openai_client()
+```
+
+#### Implement the `run` method
+
+The `run` method is the core of your workflow. It returns a list of `Experience`.
+Below is a simple example demonstrating how to implement the `run` method for a math workflow.
+
+We first call the model to generate multiple response using the provided question and rollout arguments.
+And then we use the `calculate_reward` function to calculate the reward for each response.
+Finally, we construct a list of `Experience` with the responses and rewards and return it.
+
+
+```python
+class ExampleWorkflow(Workflow):
+
+    # the __init__ function
 
     def calculate_reward(self, response: str, truth: str) -> float:
         if response == truth:
@@ -109,27 +144,48 @@ class ExampleWorkflow(Workflow):
             return 0.0
 
     def run(self) -> List[Experience]:
-        response = self.model.chat(
+        # call the model to generate multiple responses
+        responses = self.model.chat(
             [
                 {
                     "role": "user",
                     "content": f"Question:\n{self.question}",
                 }
             ],
-            n=self.task.rollout_args.n,
-            temperature=self.task.rollout_args.temperature,
+            n=self.rollout_args.n,
+            temperature=self.rollout_args.temperature,
         )
-        reward: float = self.calculate_reward(response.response_text, self.answer)
-        return [
-            Experience(
-                tokens=response.tokens,
-                prompt_length=response.prompt_length,
-                reward=reward,
-                logprobs=response.logprobs,
+        experiences = []
+        for response in responses:
+            # calulcate reward
+            reward: float = self.calculate_reward(response.response_text, self.answer)
+            # construct Experience
+            experiences.append(
+                Experience(
+                    tokens=response.tokens,
+                    prompt_length=response.prompt_length,
+                    reward=reward,
+                    logprobs=response.logprobs,
+                )
             )
-        ]
+        return experiences
+```
+
+#### Register Your Workflow
+
+Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
+
+```python
+# import some packages
+from trinity.common.workflows.workflow import WORKFLOWS
+
+@WORKFLOWS.register_module("example_workflow")
+class ExampleWorkflow(Workflow):
+    pass
 ```
 
+#### Avoid Re-initialization
+
 For some heavy workflows, the initialization process may be time-consuming.
 In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
 
@@ -147,25 +203,79 @@ class ExampleWorkflow(Workflow):
         self.answer = task.raw_task.get("answer")
 ```
 
+
+#### Full Code
+
+```python
+@WORKFLOWS.register_module("example_workflow")
+class ExampleWorkflow(Workflow):
+
+    def __init__(self, model: ModelWrapper, task: Task, auxiliary_models: List):
+        super().__init__(model, task, auxiliary_models)
+        self.question = task.raw_task.get("question")
+        self.answer = task.raw_task.get("answer")
+        self.rollout_args = task.rollout_args
+
+    def calculate_reward(self, response: str, truth: str) -> float:
+        if response == truth:
+            return 1.0
+        else:
+            return 0.0
+
+    def run(self) -> List[Experience]:
+        # call the model to generate multiple responses
+        responses = self.model.chat(
+            [
+                {
+                    "role": "user",
+                    "content": f"Question:\n{self.question}",
+                }
+            ],
+            n=self.rollout_args.n,
+            temperature=self.rollout_args.temperature,
+        )
+        experiences = []
+        for response in responses:
+            # calulcate reward
+            reward: float = self.calculate_reward(response.response_text, self.answer)
+            # construct Experience
+            experiences.append(
+                Experience(
+                    tokens=response.tokens,
+                    prompt_length=response.prompt_length,
+                    reward=reward,
+                    logprobs=response.logprobs,
+                )
+            )
+        return experiences
+
+    def resettable(self):
+        return True
+
+    def reset(self, task: Task):
+        self.question = task.raw_task.get("question")
+        self.answer = task.raw_task.get("answer")
+```
+
+
 ---
 
-### Step 3: Modify Configuration File
+### Step 3: Use Your Workflow
 
-After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input` domain to the newly registered `Workflow` name.
+After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
 
 ```yaml
 buffer:
   # Other fields
   explorer_input:
     taskset:
-      name: example_task
-      storage_type: file
       path: /path/to/taskset
-        # Other fields
-    default_workflow_type: example_workflow
-# Other fields
+      default_workflow_type: example_workflow
+      # Other fields
 ```
 
+Then you can run your workflow in the RFT procesing, through `trinity run --config <your_yaml_file>`.
+
 ---
 
 ## Check Code Style

From 35974454a4e2f4ee5c500cfea8afe800765a5815 Mon Sep 17 00:00:00 2001
From: pxc <panxuchen.pxc@alibaba-inc.com>
Date: Mon, 26 May 2025 14:52:34 +0800
Subject: [PATCH 2/5] update developer gude

---
 .../sphinx_doc/source/tutorial/trinity_programming_guide.md | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
index ac0151259f..a6938651d4 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
@@ -274,7 +274,11 @@ buffer:
       # Other fields
 ```
 
-Then you can run your workflow in the RFT procesing, through `trinity run --config <your_yaml_file>`.
+Then you can run your workflow in the RFT procesing, through the following command.
+
+```
+trinity run --config <your_yaml_file>
+```
 
 ---
 

From eb581f7bde9ad00892f69d2394413375f583eb5c Mon Sep 17 00:00:00 2001
From: pxc <panxuchen.pxc@alibaba-inc.com>
Date: Mon, 26 May 2025 15:21:01 +0800
Subject: [PATCH 3/5] polish developer guide

---
 .../tutorial/trinity_programming_guide.md     | 66 ++++++++++---------
 1 file changed, 35 insertions(+), 31 deletions(-)

diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
index a6938651d4..37b8dd5da0 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
@@ -1,6 +1,6 @@
 # Developer Guide
 
-This guide will introduce how to add new workflows to Trinity-RFT and provide relevant development guidelines.
+This guide introduces how to add new workflows to Trinity-RFT and provides relevant development guidelines.
 
 ```{note}
 Trinity-RFT is still under development, and the following interfaces may change. Please read this section in conjunction with the latest code.
@@ -10,7 +10,7 @@ Trinity-RFT is still under development, and the following interfaces may change.
 
 ## Creating New Workflows
 
-Trinity-RFT supports developers in registering new workflows (e.g., multi-round interaction scenarios). Below are the steps to create a new workflow:
+Trinity-RFT allows developers to register new workflows (e.g., for multi-turn interactions or agentic scenarios). Below are the steps to create a new workflow:
 
 ---
 
@@ -19,34 +19,34 @@ Trinity-RFT supports developers in registering new workflows (e.g., multi-round
 Before starting development, it's important to understand several core concepts:
 
 
-- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The `Task` data format may vary significantly depending on the type of task:
-  - **Math problems**: `Task` contains the problem description and the standard answer.
-  - **Programming scenarios**: `Task` includes the problem description, test cases, runtime environment, and other complex information.
+- **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type:
+  - **Math problems**: A `Task` contains the problem description and the standard answer.
+  - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
 
 
-- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`, defining the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates `Experience`. Trinity-RFT has several built-in workflows:
+- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
   - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
   - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
   - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
   - ...
 
 
-- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`, where the internal data format depends on the algorithm used for training. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token id, action_mask (identifying which tokens were generated by the LLM), logprobs, rewards, etc.
+- **Experience** ({class}`trinity.common.experience.Experience`): The output of running a `Workflow`.  The internal data format depends on the training algorithm used. For example, for common PPO/GRPO algorithms, `Experience` includes lists of token IDs, action masks (identifying which tokens were generated by the LLM), log probabilities, rewards, etc.
 
 ---
 
 ### Step 1: Prepare Task Dataset
 
-The explorer load the task dataset through the `buffer.explorer_input.taskset` in configuration file.
-To deal with the differences in `Task` data format, Trinity-RFT provides a unified `Task` interface, which containes the following fields.
+The task dataset is loaded via the `buffer.explorer_input.taskset` configuration entry in your YAML config file.
+To handle differences in `Task` contents, Trinity-RFT provides a unified `Task` interface containing the following fields.
 
-  - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your yaml config file.
-  - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some some workflows have already integrated the reward calculation, you can ignore this field in such cases.
-  - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without the following fields.
-  - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. The `format_args` comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.format` of the yaml file.
-  - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters to facilitate the rollout process, e.g., the `temperature`. This field also comes from the yaml configuration file, and you can set it in the `buffer.explorer_input.task_set.rollout_args` of the yaml file.
+  - **`workflow`** (`str`): The registered name of your workflow class. You can specify it in `buffer.explorer_input.taskset.default_workflow_type` of your YAML config file.
+  - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
+  - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
+  - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
+  - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
 
-In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line’s JSON contains `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
+In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
 
 ```
 {"question": "1+1=", "answer": "2"}
@@ -54,6 +54,8 @@ In the math problem scenario, the `Task` dataset can be a `jsonl` file, where ea
 ...
 ```
 
+Example configuration snippet:
+
 ```yaml
 # some config
 buffer:
@@ -69,7 +71,7 @@ buffer:
       # some other configs
 ```
 
-In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`), and the `MathWorkflow` will use the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
+In this example, each task object's `raw_task` is a `Dict` with two keys (`question` and `answer`). The `MathWorkflow` uses the `prompt_key` and `response_key` to extract the question and answer from the `raw_task` and use the `rollout_args` to generate the response.
 
 
 ---
@@ -96,19 +98,20 @@ class Workflow(ABC):
 ```
 
 
-#### Initialization Your Workflow
+#### Initializing Your Workflow
+
+During initialization, `Workflow` receives the following parameters:
 
-When initializing, `Workflow` receives the following parameters:
 - `model`({class}`trinity.common.models.model.ModelWrapper`): The model being trained, which provides an interface similar to OpenAI, capable of receiving a list of conversation messages and returning content generated by the LLM (including reply text `response_text`, full sequence token ids `tokens`, prompt part token length `prompt_length`, and a list of output token logprobs `logprobs`).
-- `task`({class}`trinity.common.workflows.Task`): An data item generated by one line of data from the task dataset.
-- `auxiliary_models`(`List` of `openai.OpenAI`): A list of auxiliary models, which will not be trained. All of them are provide as OpenAI compatible API.
+- `task`({class}`trinity.common.workflows.Task`): A single data item from the task dataset.
+- `auxiliary_models`(`List[openai.OpenAI]`):A list of auxiliary models not involved in training. All are provided via OpenAI-compatible APIs.
 
 
 ```{tip}
-The `model` also provided an OpenAI compatible API, you can switch to it by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and use `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
+You can switch to using the OpenAI API by setting `explorer.rollout_model.enable_openai_api` to `true` in your config file and calling `model.get_openai_client()` to get an `openai.OpenAI` instance in your workflow.
 ```
 
-In the example below, we only use the `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` in `Task` to further the initialization.
+Here’s an example of initializing a simple workflow using only `raw_task` and `rollout_args`. In more complex cases, you can use the `format_args` for further customization.
 
 ```python
 class ExampleWorkflow(Workflow):
@@ -122,13 +125,13 @@ class ExampleWorkflow(Workflow):
         # self.openai_client = self.model.get_openai_client()
 ```
 
-#### Implement the `run` method
+#### Implementing the `run` method
 
 The `run` method is the core of your workflow. It returns a list of `Experience`.
-Below is a simple example demonstrating how to implement the `run` method for a math workflow.
+Below is a simple implementation for a math workflow.
 
 We first call the model to generate multiple response using the provided question and rollout arguments.
-And then we use the `calculate_reward` function to calculate the reward for each response.
+Then we calculate the reward for each response using the `calculate_reward` function.
 Finally, we construct a list of `Experience` with the responses and rewards and return it.
 
 
@@ -171,9 +174,10 @@ class ExampleWorkflow(Workflow):
         return experiences
 ```
 
-#### Register Your Workflow
+#### Registering Your Workflow
 
-Developers can register `Workflow` through the `WORKFLOWS.register_module` method, but need to ensure that the name does not conflict with existing `Workflow` classes.
+Register your workflow using the `WORKFLOWS.register_module` decorator.
+Ensure the name does not conflict with existing workflows.
 
 ```python
 # import some packages
@@ -186,7 +190,7 @@ class ExampleWorkflow(Workflow):
 
 #### Avoid Re-initialization
 
-For some heavy workflows, the initialization process may be time-consuming.
+For heavy workflows, avoid re-initializing resources every time.
 In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
 
 ```python
@@ -204,7 +208,7 @@ class ExampleWorkflow(Workflow):
 ```
 
 
-#### Full Code
+#### Full Code Example
 
 ```python
 @WORKFLOWS.register_module("example_workflow")
@@ -262,7 +266,7 @@ class ExampleWorkflow(Workflow):
 
 ### Step 3: Use Your Workflow
 
-After completing the development of the `Workflow`, you need to modify the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
+After implementing and registering your workflow, you need to update the configuration file to set the `default_workflow_type` in the `buffer.explorer_input.taskset` domain to the newly registered `Workflow` name.
 
 ```yaml
 buffer:
@@ -274,7 +278,7 @@ buffer:
       # Other fields
 ```
 
-Then you can run your workflow in the RFT procesing, through the following command.
+Now you can run your workflow in Trinity-RFT using the command:
 
 ```
 trinity run --config <your_yaml_file>

From fe48c80b4ff3b5b70f665b031c2c02523510414f Mon Sep 17 00:00:00 2001
From: pxc <panxuchen.pxc@alibaba-inc.com>
Date: Mon, 26 May 2025 15:36:19 +0800
Subject: [PATCH 4/5] fix typo

---
 docs/sphinx_doc/source/tutorial/trinity_programming_guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
index 37b8dd5da0..bc7be3d159 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
@@ -24,7 +24,7 @@ Before starting development, it's important to understand several core concepts:
   - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
 
 
-- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. . It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
+- **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
   - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
   - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
   - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.

From 0d35b27bf3bad4c422f6fbd011c35965b7c3c8f0 Mon Sep 17 00:00:00 2001
From: pxc <panxuchen.pxc@alibaba-inc.com>
Date: Mon, 26 May 2025 15:42:08 +0800
Subject: [PATCH 5/5] fix comments

---
 .../source/tutorial/trinity_programming_guide.md          | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
index bc7be3d159..2e4daeab0b 100644
--- a/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
+++ b/docs/sphinx_doc/source/tutorial/trinity_programming_guide.md
@@ -20,12 +20,12 @@ Before starting development, it's important to understand several core concepts:
 
 
 - **Task** ({class}`trinity.common.workflows.Task`): Represents a data structure that can be converted into a `Workflow`. The content of the `Task` varies depending on the task type:
-  - **Math problems**: A `Task` contains the problem description and the standard answer.
+  - **Math problems**: A `Task` contains the problem description and the golden answer.
   - **Programming scenarios**: A `Task` includes the problem description, test cases, runtime environment, and other complex information.
 
 
 - **Workflow** ({class}`trinity.common.workflows.Workflow`): Can be understood as the running state of a `Task`. It defines the interaction flow between Agents and Environments, including logic similar to _Rollout_ and _Reward_ calculations in other frameworks. After execution, it generates a list of `Experience`. Trinity-RFT includes several built-in workflows:
-  - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses results, and calculates scores (rewards).
+  - `MathWorkflow` ({class}`trinity.common.workflows.MathWorkflow`): For math scenarios, submits problems to LLM, parses LLM responses, and calculates scores (rewards).
   - `WebShopWorkflow` ({class}`trinity.common.workflows.WebShopWorkflow`): For webshop scenarios, it contains multi-turn interaction with environment.
   - `CodeWorkflow` (Coming soon): For coding scenarios, executes returned code, runs tests, and calculates rewards based on test results.
   - ...
@@ -44,7 +44,7 @@ To handle differences in `Task` contents, Trinity-RFT provides a unified `Task`
   - **`reward_fn`** (`Optional[str]`): The registered name of your reward function. You can specify it in `buffer.explorer_input.taskset.default_reward_fn_type`. Note that some workflows already include built-in reward calculation; in such cases, you can omit this field.
   - **`raw_task`** (`Dict`): An record of raw data in `Dict` format. For highly customized workflow, you can directly use `raw_task` to initialize your `Workflow` instance without relying on the following fields.
   - **`format_args`** ({class}`trinity.common.config.FormatConfig`): Parameters to facilitate the construction of `Workflow` instances. For example, the `prompt_key` and `response_key` can be used to get the prompt and response from `raw_task`. These settings come from the YAML configuration file and can be set in `buffer.explorer_input.task_set.format`.
-  - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. his field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
+  - **`rollout_args`** ({class}`trinity.common.config.GenerationConfig`): Parameters that control the rollout process, such as `temperature`. This field also comes from the YAML configuration file and can be set in `buffer.explorer_input.task_set.rollout_args`.
 
 In the math problem scenario, the `Task` dataset can be a `jsonl` file, where each line contains JSON with `question` and `answer` fields representing the problem description and standard answer, respectively. For example:
 
@@ -190,7 +190,7 @@ class ExampleWorkflow(Workflow):
 
 #### Avoid Re-initialization
 
-For heavy workflows, avoid re-initializing resources every time.
+For heavy workflows, re-initializing every time can incurs extra computational costs.
 In this case, you can implement the `resettable` and `reset` methods to avoid re-initialization.
 
 ```python