update documents

chenyushuo · chenyushuo · commit e7d3fb3acdd7 · 2025-12-29T14:40:53.000+08:00
diff --git a/README_zh.md b/README_zh.md
@@ -80,10 +80,6 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 > [!NOTE]
 > 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)。
->
-> 没有 GPU？没问题！你仍然可以尝试以下方案：
-> 1. 按照安装步骤操作（可跳过 GPU 专用的软件包，例如 `flash-attn`）
-> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**，该示例专为仅使用 CPU 的系统设计。
 
 
 
diff --git a/docs/sphinx_doc/source/tutorial/example_async_mode.md b/docs/sphinx_doc/source/tutorial/example_async_mode.md
@@ -112,6 +112,10 @@ You can run this example with the following command:
 bash examples/async_gsm8k/run.sh
 ```
 
+```{note}
+In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.
+```
+
 The following plot shows the learning curve of GRPO in the asynchronous mode.
 > This result should be regarded merely as a baseline, since GRPO is supposed to be an on-policy algorithm.
 > We are continuously investigating other RL algorithms (e.g., [OPMD](./example_reasoning_advanced.md)) in the asynchronous mode.
diff --git a/docs/sphinx_doc/source_zh/tutorial/example_async_mode.md b/docs/sphinx_doc/source_zh/tutorial/example_async_mode.md
@@ -112,6 +112,10 @@ trainer:
 bash examples/async_gsm8k/run.sh
 ```
 
+```{note}
+目前异步 RFT 训练中，最好需要先启动Trainer后启动Explorer，以避免在Explorer进程提前结束之后，Trainer读取不到生成的Experience数据。此问题将在未来的版本中解决。
+```
+
 下图展示了 GRPO 在异步模式下的学习曲线：
 > 此结果仅应视为基线，因为 GRPO 本质上是一种 on-policy 算法。
 > 我们正在持续研究其他在异步模式下适用的强化学习算法（例如 [OPMD](./example_reasoning_advanced.md)）。
diff --git a/examples/async_gsm8k/README.md b/examples/async_gsm8k/README.md
@@ -11,3 +11,6 @@ You can run this example by the following command:
 ```bash
 bash examples/async_gsm8k/run.sh
 ```
+
+> [!NOTE]
+> In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.
diff --git a/examples/async_gsm8k/run.sh b/examples/async_gsm8k/run.sh
@@ -1,4 +1,4 @@
 #!/bin/bash
-trinity run --config examples/async_gsm8k/explorer.yaml 2>&1 | tee explorer.log &
-sleep 30
 trinity run --config examples/async_gsm8k/trainer.yaml 2>&1 | tee trainer.log &
+sleep 30
+trinity run --config examples/async_gsm8k/explorer.yaml 2>&1 | tee explorer.log &
diff --git a/tests/trainer/trainer_test.py b/tests/trainer/trainer_test.py
@@ -984,13 +984,6 @@ async def test_serve_with_trainer(self):  # noqa: C901
         trainer_process = multiprocessing.Process(target=run_trainer, args=(trainer_config,))
         trainer_process.start()
 
-        await asyncio.sleep(5)
-        serve_config = deepcopy(config)
-        serve_config.mode = "serve"
-        serve_config.check_and_update()
-        serve_process = multiprocessing.Process(target=run_serve, args=(serve_config,))
-        serve_process.start()
-
         ray.init(ignore_reinit_error=True)
         while True:
             try:
@@ -999,6 +992,11 @@ async def test_serve_with_trainer(self):  # noqa: C901
             except ValueError:
                 print("waiting for trainer to start.")
                 await asyncio.sleep(5)
+        serve_config = deepcopy(config)
+        serve_config.mode = "serve"
+        serve_config.check_and_update()
+        serve_process = multiprocessing.Process(target=run_serve, args=(serve_config,))
+        serve_process.start()
 
         state_manager = StateManager(
             path=serve_config.checkpoint_job_dir,