Skip to content

Commit d9ca49a

Browse files
committed
Merge branch 'main' into dev/cyx/enhance_exp_replay
2 parents ba34124 + d5db95a commit d9ca49a

File tree

31 files changed

+678
-153
lines changed

31 files changed

+678
-153
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[**中文主页**](https://github.com/modelscope/Trinity-RFT/blob/main/README_zh.md) | [**Tutorial**](https://modelscope.github.io/Trinity-RFT/) | [**FAQ**](./docs/sphinx_doc/source/tutorial/faq.md)
1+
[**中文主页**](https://github.com/modelscope/Trinity-RFT/blob/main/README_zh.md) | [**Tutorial**](https://modelscope.github.io/Trinity-RFT/) | [**FAQ**](https://modelscope.github.io/Trinity-RFT/en/main/tutorial/faq.html)
22

33
<div align="center">
44
<img src="https://img.alicdn.com/imgextra/i1/O1CN01lvLpfw25Pl4ohGZnU_!!6000000007519-2-tps-1628-490.png" alt="Trinity-RFT" style="height: 120px;">

README_zh.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
[**English Homepage**](https://github.com/modelscope/Trinity-RFT/blob/main/README.md) | [**中文文档**](https://modelscope.github.io/Trinity-RFT/zh/) | [**常见问题**](./docs/sphinx_doc/source/zh/tutorial/faq.md)
1+
[**English Homepage**](https://github.com/modelscope/Trinity-RFT/blob/main/README.md) | [**中文文档**](https://modelscope.github.io/Trinity-RFT/zh/) | [**常见问题**](https://modelscope.github.io/Trinity-RFT/zh/main/tutorial/faq.html)
22

33
<div align="center">
44
<img src="https://img.alicdn.com/imgextra/i1/O1CN01lvLpfw25Pl4ohGZnU_!!6000000007519-2-tps-1628-490.png" alt="Trinity-RFT" style="height: 120px;">
367 KB
Loading

docs/sphinx_doc/source/tutorial/example_search_email.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Email Search Workflow
22

33

4-
This example shows a multi-turn email search workflow, inspired by [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904). We implement a ReAct Agent and define tools for email search. Note that this example rewquires installing `AgentScope==0.1.6`.
4+
This example shows a multi-turn email search workflow, inspired by [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904). We implement a ReAct Agent and define tools for email search. Note that this example requires installing [AgentScope](https://github.com/agentscope-ai/agentscope?tab=readme-ov-file#-installation).
55

66
## Core Components
77

docs/sphinx_doc/source/tutorial/faq.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,16 @@ export RAY_DEDUP_LOGS=0
107107
trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
108108
```
109109

110+
### Debugging the Workflow
111+
112+
To debug a new workflow, use Trinity-RFT's debug mode with the following steps:
113+
114+
1. Launch the inference model via `trinity debug --config <config_file_path> --module inference_model`
115+
116+
2. Debug the workflow in another terminal via `trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>`
117+
118+
Please refer to {ref}`Workflow Development Guide <Workflows>` section for details.
119+
110120

111121
## Part 4: Other Questions
112122
**Q:** What's the purpose of `buffer.trainer_input.experience_buffer.path`?

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ model:
160160

161161
- `model_path`: Path to the model being trained.
162162
- `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
163-
- `max_model_len`: Maximum number of tokens in a sequence. It is recommended to set this value manually. If not set, it will be inferred from the model configuration.
163+
- `max_model_len`: Maximum number of tokens in a sequence. It is recommended to set this value manually. If not specified, the system will attempt to set it to `max_prompt_tokens` + `max_response_tokens`. However, this requires both values to be already set; otherwise, an error will be raised.
164164
- `max_response_tokens`: Maximum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`.
165165
- `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
166166
- `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
@@ -405,6 +405,7 @@ trainer:
405405
trainer_type: 'verl'
406406
save_interval: 100
407407
total_steps: 1000
408+
save_strategy: "unrestricted"
408409
trainer_config: null
409410
trainer_config_path: ''
410411
```
@@ -413,6 +414,11 @@ trainer:
413414
- `trainer_type`: Trainer backend implementation. Currently only supports `verl`.
414415
- `save_interval`: Frequency (in steps) at which to save model checkpoints.
415416
- `total_steps`: Total number of training steps.
417+
- `save_strategy`: The parallel strategy used when saving the model. Defaults to `unrestricted`. The available options are as follows:
418+
- `single_thread`: Only one thread across the entire system is allowed to save the model; saving tasks from different threads are executed sequentially.
419+
- `single_process`: Only one process across the entire system is allowed to perform saving; multiple threads within that process can handle saving tasks in parallel, while saving operations across different processes are executed sequentially.
420+
- `single_node`: Only one compute node across the entire system is allowed to perform saving; processes and threads within that node can work in parallel, while saving operations across different nodes are executed sequentially.
421+
- `unrestricted`: No restrictions on saving operations; multiple nodes, processes, or threads are allowed to save the model simultaneously.
416422
- `trainer_config`: The trainer configuration provided inline.
417423
- `trainer_config_path`: The path to the trainer configuration file. Only one of `trainer_config_path` and `trainer_config` should be specified.
418424

docs/sphinx_doc/source_zh/tutorial/example_search_email.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 邮件搜索例子
22

3-
这个示例展示了一个多轮邮件搜索工作流,内容参考自 [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904)。我们实现了一个 ReAct Agent,并定义了用于邮件搜索的工具。注意:此示例需要安装 `AgentScope==0.1.6`
3+
这个示例展示了一个多轮邮件搜索工作流,内容参考自 [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904)。我们实现了一个 ReAct Agent,并定义了用于邮件搜索的工具。注意:此示例需要安装 [AgentScope](https://github.com/agentscope-ai/agentscope?tab=readme-ov-file#-installation)
44

55
## 核心组件
66

docs/sphinx_doc/source_zh/tutorial/faq.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,18 @@ export RAY_DEDUP_LOGS=0
106106
trinity run --config grpo_gsm8k/gsm8k.yaml 2>&1 | tee debug.log
107107
```
108108

109+
### 调试工作流(Workflow)
110+
111+
112+
实现新工作流后,可使用 Trinity-RFT 的调试模式进行调试,步骤如下:
113+
114+
1. 启动推理模型: `trinity debug --config <config_file_path> --module inference_model`
115+
116+
2. 在另一个终端中进行工作流的调试:`trinity debug --config <config_file_path> --module workflow --output_file <output_file_path> --plugin_dir <plugin_dir>`
117+
118+
更多详细信息,请参阅{ref}`工作流开发指南 <Workflows>`章节。
119+
120+
109121
## 第四部分:其他问题
110122
**Q:** `buffer.trainer_input.experience_buffer.path` 的作用是什么?
111123

docs/sphinx_doc/source_zh/tutorial/trinity_configs.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ model:
160160

161161
- `model_path`: 被训练模型的路径。
162162
- `critic_model_path`: 可选的独立 critic 模型路径。若为空,则默认为 `model_path`。
163-
- `max_model_len`: 该模型所支持的单个序列最大 token 数。
163+
- `max_model_len`: 表示模型所支持的单个序列最大 token 数。如未指定,系统会尝试将其设为 `max_prompt_tokens` + `max_response_tokens`。但前提是这两个值都必须已设置,否则将引发错误
164164
- `max_prompt_tokens`: 输入 prompt 中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
165165
- `max_response_tokens`: 模型生成的回复中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
166166
- `min_response_tokens`: 模型生成的回复中允许的最小 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
@@ -405,6 +405,7 @@ trainer:
405405
trainer_type: 'verl'
406406
save_interval: 100
407407
total_steps: 1000
408+
save_strategy: "unrestricted"
408409
trainer_config: null
409410
trainer_config_path: ''
410411
```
@@ -413,6 +414,11 @@ trainer:
413414
- `trainer_type`: trainer 后端实现。目前仅支持 `verl`。
414415
- `save_interval`: 保存模型检查点的频率(步)。
415416
- `total_steps`: 总训练步数。
417+
- `save_strategy`: 模型保存时的并行策略。默认值为`unrestricted`。可选值如下:
418+
- `single_thread`:整个系统中,仅允许一个线程进行模型保存,不同保存线程之间串行执行。
419+
- `single_process`:整个系统中,仅允许一个进程执行保存,该进程内的多个线程可以并行处理保存任务,不同进程之间串行执行。
420+
- `single_node`:整个系统中,仅允许一个计算节点执行保存,该节点内的进程和线程可并行工作,不同节点的保存串行执行。
421+
- `unrestricted`:不限制保存操作,允许多个节点、进程或线程同时保存模型。
416422
- `trainer_config`: 内联提供的 trainer 配置。
417423
- `trainer_config_path`: trainer 配置文件的路径。`trainer_config_path` 和 `trainer_config` 只能指定其一。
418424

examples/grpo_email_search/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Email Search Workflow
22

3-
This example shows a multi-turn email search workflow, inspired by [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904). We implement a ReAct Agent and define tools for email search. Note that this example rewquires installing `AgentScope==0.1.6`.
3+
This example shows a multi-turn email search workflow, inspired by [ART](https://openpipe.ai/blog/art-e-mail-agent?refresh=1756431423904). We implement a ReAct Agent and define tools for email search. Note that this example requires installing [AgentScope](https://github.com/agentscope-ai/agentscope?tab=readme-ov-file#-installation).
44

55
For more detailed information, please refer to the [documentation](../../docs/sphinx_doc/source/tutorial/example_search_email.md).
66

0 commit comments

Comments
 (0)