Skip to content

Commit ba4b34e

Browse files
committed
update docs and apply reviews
1 parent f029cbd commit ba4b34e

File tree

5 files changed

+23
-17
lines changed

5 files changed

+23
-17
lines changed

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
4242

4343
## 🚀 News
4444

45+
* [2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
4546
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
4647
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
4748
* [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
154155

155156
> [!NOTE]
156157
> This project is currently under active development. Comments and suggestions are welcome!
158+
>
159+
> **No GPU? No problem!** You can still try it out:
160+
> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
161+
> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
157162
158163

159164
### Step 1: installation

README_zh.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
4141

4242
## 🚀 新闻
4343

44+
* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
4445
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
4546
* [2025-11] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
4647
* [2025-11] 推出 [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)).
@@ -79,6 +80,10 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
7980

8081
> [!NOTE]
8182
> 更多教程请参考 [Trinity-RFT 文档](https://modelscope.github.io/Trinity-RFT/)
83+
>
84+
> 没有 GPU?没问题!你仍然可以尝试以下方案:
85+
> 1. 按照安装步骤操作(可跳过 GPU 专用的软件包,例如 `flash-attn`
86+
> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。
8287
8388

8489

examples/tinker/README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
# Trinity with Tinker Backend
22

3-
This example demonstrates how to use Trinity with the [Tinker](https://thinkingmachines.ai/tinker/) backend, which enables model training on devices without GPUs.
3+
> [!NOTE]
4+
> This example demonstrates how to use Trinity with the [Tinker](https://thinkingmachines.ai/tinker/) backend, which enables model training on devices **without GPUs**.
45
56
## Setup Instructions
67

@@ -28,7 +29,7 @@ model:
2829
2930
### 3. Configuration Parameters Explained
3031
31-
- **`tinker`**: Tinker-specific configuration section. **Important**: When Tinker is enabled, any LoRA configuration settings will be ignored.
32+
- **`tinker`**: Tinker-specific configuration section. **Important**: When Tinker is enabled, any LoRA configuration settings (`model.lora_configs`) will be ignored.
3233
- **`enable`**: Whether to activate the Tinker backend. Default: `false`
3334
- **`base_model`**: Path to the base model for Tinker. If not specified (`null`), it defaults to the `model_path` defined elsewhere in your config
3435
- **`rank`**: The LoRA rank that controls the size of the adaptation matrices. Default: `32`
@@ -50,10 +51,12 @@ trinity run --config tinker.yaml # Replace with your actual config file path
5051

5152
1. **Entropy loss** is not consistent compared to veRL backends.
5253
2. **Algorithms requiring `compute_advantage_in_trainer=true` are NOT supported**, including:
53-
- `PPOAlgorithm`
54-
- `ReinforcePlusPlusAlgorithm`
55-
- `RLOOAlgorithm`
56-
- `OnPolicyDistillAlgorithm`
54+
- PPO (`algorithm.algorithm_type=ppo`)
55+
- Reinforce++ (`algorithm.algorithm_type=reinforceplusplus`)
56+
- RLOO (`algorithm.algorithm_type=rloo`)
57+
- On-policy distillation (`algorithm.algorithm_type=on_policy_distill`)
58+
59+
Algorithms like `algorithm.algorithm_type=grpo` are supported.
5760

5861
> 💡 A complete example configuration file is available at [`tinker.yaml`](tinker.yaml).
5962

@@ -197,10 +200,6 @@ buffer:
197200
storage_type: queue
198201
replay_buffer:
199202
enable: false
200-
priority_fn: linear_decay
201-
reuse_cooldown_time: null
202-
priority_fn_args:
203-
decay: 2.0
204203
explorer:
205204
runner_per_model: 16
206205
rollout_model:
@@ -240,8 +239,6 @@ synchronizer:
240239

241240
### Observations
242241

243-
Since **Llama-3.2-3B** is a base (non-instruct-tuned) model, it has limited ability to follow formatting instructions. Additionally, we trained for only **one epoch**. As a result, both backends achieved final rewards just slightly above **0.1**.
244-
245-
However, the training curves clearly show an **upward trend in reward**, indicating successful learning. The results are visualized below:
242+
Since Llama-3.2-3B is a base (non-instruct-tuned) model, it has limited ability to follow formatting instructions. Additionally, we trained for only **one epoch**. As a result, both backends achieved final rewards just slightly above 0.1. Nonetheless, the training curves show a clear upward trend in reward, indicating successful learning. The results are visualized below:
246243

247244
![Training Rewards on GSM8K](../../docs/sphinx_doc/assets/tinker-gsm8k.png)

trinity/common/models/tinker_model.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -195,10 +195,9 @@ def get_model_version(self) -> int:
195195

196196
def get_api_server_url(self) -> Optional[str]:
197197
"""Get the API server URL if available."""
198-
# TODO
198+
# TODO: tinker will support openai api later
199199
return None
200200

201201
def get_model_path(self) -> Optional[str]:
202202
"""Get the model path"""
203-
# TODO
204-
return None
203+
return self.config.model_path # type: ignore [return-value]

trinity/trainer/tinker_trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ async def prepare(self):
125125
f"global_step_{self.latest_remote_sampler_step}",
126126
"remote_sampler_path.txt",
127127
)
128-
with open(sampler_file_path, "r"):
128+
with open(sampler_file_path, "r") as f:
129129
self.latest_remote_sampler_path = f.read().strip()
130130
else:
131131
self.latest_remote_sampler_step = 0

0 commit comments

Comments
 (0)