Skip to content

Commit e7d3fb3

Browse files
committed
update documents
1 parent dd8dde4 commit e7d3fb3

File tree

6 files changed

+18
-13
lines changed

6 files changed

+18
-13
lines changed

README_zh.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,6 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
8080

8181
> [!NOTE]
8282
> 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)
83-
>
84-
> 没有 GPU?没问题!你仍然可以尝试以下方案:
85-
> 1. 按照安装步骤操作(可跳过 GPU 专用的软件包,例如 `flash-attn`
86-
> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。
8783
8884

8985

docs/sphinx_doc/source/tutorial/example_async_mode.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ You can run this example with the following command:
112112
bash examples/async_gsm8k/run.sh
113113
```
114114

115+
```{note}
116+
In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.
117+
```
118+
115119
The following plot shows the learning curve of GRPO in the asynchronous mode.
116120
> This result should be regarded merely as a baseline, since GRPO is supposed to be an on-policy algorithm.
117121
> We are continuously investigating other RL algorithms (e.g., [OPMD](./example_reasoning_advanced.md)) in the asynchronous mode.

docs/sphinx_doc/source_zh/tutorial/example_async_mode.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ trainer:
112112
bash examples/async_gsm8k/run.sh
113113
```
114114

115+
```{note}
116+
目前异步 RFT 训练中,最好需要先启动Trainer后启动Explorer,以避免在Explorer进程提前结束之后,Trainer读取不到生成的Experience数据。此问题将在未来的版本中解决。
117+
```
118+
115119
下图展示了 GRPO 在异步模式下的学习曲线:
116120
> 此结果仅应视为基线,因为 GRPO 本质上是一种 on-policy 算法。
117121
> 我们正在持续研究其他在异步模式下适用的强化学习算法(例如 [OPMD](./example_reasoning_advanced.md))。

examples/async_gsm8k/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,6 @@ You can run this example by the following command:
1111
```bash
1212
bash examples/async_gsm8k/run.sh
1313
```
14+
15+
> [!NOTE]
16+
> In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.

examples/async_gsm8k/run.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
#!/bin/bash
2-
trinity run --config examples/async_gsm8k/explorer.yaml 2>&1 | tee explorer.log &
3-
sleep 30
42
trinity run --config examples/async_gsm8k/trainer.yaml 2>&1 | tee trainer.log &
3+
sleep 30
4+
trinity run --config examples/async_gsm8k/explorer.yaml 2>&1 | tee explorer.log &

tests/trainer/trainer_test.py

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -984,13 +984,6 @@ async def test_serve_with_trainer(self): # noqa: C901
984984
trainer_process = multiprocessing.Process(target=run_trainer, args=(trainer_config,))
985985
trainer_process.start()
986986

987-
await asyncio.sleep(5)
988-
serve_config = deepcopy(config)
989-
serve_config.mode = "serve"
990-
serve_config.check_and_update()
991-
serve_process = multiprocessing.Process(target=run_serve, args=(serve_config,))
992-
serve_process.start()
993-
994987
ray.init(ignore_reinit_error=True)
995988
while True:
996989
try:
@@ -999,6 +992,11 @@ async def test_serve_with_trainer(self): # noqa: C901
999992
except ValueError:
1000993
print("waiting for trainer to start.")
1001994
await asyncio.sleep(5)
995+
serve_config = deepcopy(config)
996+
serve_config.mode = "serve"
997+
serve_config.check_and_update()
998+
serve_process = multiprocessing.Process(target=run_serve, args=(serve_config,))
999+
serve_process.start()
10021000

10031001
state_manager = StateManager(
10041002
path=serve_config.checkpoint_job_dir,

0 commit comments

Comments
 (0)