Skip to content

Commit a864486

Browse files
authored
Add tinker backend. (#448)
1 parent f88181c commit a864486

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+1733
-290
lines changed

README.md

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
4242

4343
## 🚀 News
4444

45+
* [2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
4546
* [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
4647
* [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
4748
* [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
154155

155156
> [!NOTE]
156157
> This project is currently under active development. Comments and suggestions are welcome!
158+
>
159+
> **No GPU? No problem!** You can still try it out:
160+
> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
161+
> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
157162
158163

159164
### Step 1: installation
@@ -186,10 +191,15 @@ Choose one of the following options:
186191
conda create -n trinity python=3.12
187192
conda activate trinity
188193

189-
pip install -e ".[dev]"
190-
pip install -e ".[flash_attn]"
191-
# if you encounter issues when installing flash-attn, try:
194+
pip install -e ".[vllm,flash_attn]"
195+
196+
# If you have no GPU, comment out the line above and uncomment this instead:
197+
# pip install -e ".[tinker]"
198+
199+
# If you encounter issues when installing flash-attn, try:
192200
# pip install flash-attn==2.8.1 --no-build-isolation
201+
202+
pip install -e ".[dev]" # for development like linting and debugging
193203
```
194204

195205
###### Using venv
@@ -198,18 +208,26 @@ pip install -e ".[flash_attn]"
198208
python3.10 -m venv .venv
199209
source .venv/bin/activate
200210

201-
pip install -e ".[dev]"
202-
pip install -e ".[flash_attn]"
203-
# if you encounter issues when installing flash-attn, try:
211+
pip install -e ".[vllm,flash_attn]"
212+
213+
# If you have no GPU, comment out the line above and uncomment this instead:
214+
# pip install -e ".[tinker]"
215+
216+
# If you encounter issues when installing flash-attn, try:
204217
# pip install flash-attn==2.8.1 --no-build-isolation
218+
219+
pip install -e ".[dev]" # for development like linting and debugging
205220
```
206221

207222
###### Using `uv`
208223

209224
[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
210225

211226
```bash
212-
uv sync --extra dev --extra flash_attn
227+
uv sync --extra vllm --extra dev --extra flash_attn
228+
229+
# If you have no GPU, try to use Tinker instead:
230+
# uv sync --extra tinker --extra dev
213231
```
214232

215233

README_zh.md

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
4141

4242
## 🚀 新闻
4343

44+
* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端,可在**无 GPU 的设备**上进行模型训练。
4445
* [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务,让 AI 智能体能够理解模糊症状、主动询问后续问题,并提供精准推荐([新闻](https://tech.china.com.cn/sx/20251201/411376.shtml))。
4546
* [2025-11] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布:修复若干 Bug。
4647
* [2025-11] 推出 [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask):利用离线专家数据,训练具备主动问询能力的对话智能体([论文](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能:
154155

155156
> [!NOTE]
156157
> 本项目正处于活跃开发阶段。欢迎提出意见和建议!
158+
>
159+
> **没有 GPU?没问题!** 您仍然可以尝试使用:
160+
> 1. 按照安装步骤进行操作(可跳过 `flash-attn` 等 GPU 专用的软件包)
161+
> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**,该示例专为仅使用 CPU 的系统设计。
157162
158163

159164
### 第一步:安装
@@ -185,10 +190,15 @@ cd Trinity-RFT
185190
conda create -n trinity python=3.12
186191
conda activate trinity
187192

188-
pip install -e ".[dev]"
189-
pip install -e ".[flash_attn]"
193+
pip install -e ".[vllm,flash_attn]"
194+
195+
# 如果没有GPU,可以注释上一行的命令,改为使用Tinker:
196+
# pip install -e ".[tinker]"
197+
190198
# 如果安装 flash-attn 时遇到问题,可尝试:
191199
# pip install flash-attn==2.8.1 --no-build-isolation
200+
201+
pip install -e ".[dev]" # 用于调试和开发
192202
```
193203

194204
#### 使用 venv
@@ -197,18 +207,26 @@ pip install -e ".[flash_attn]"
197207
python3.10 -m venv .venv
198208
source .venv/bin/activate
199209

200-
pip install -e ".[dev]"
201-
pip install -e ".[flash_attn]"
210+
pip install -e ".[vllm,flash_attn]"
211+
212+
# 如果没有GPU,可以注释上一行的命令,改为使用Tinker:
213+
# pip install -e ".[tinker]"
214+
202215
# 如果安装 flash-attn 时遇到问题,可尝试:
203216
# pip install flash-attn==2.8.1 --no-build-isolation
217+
218+
pip install -e ".[dev]" # 用于调试和开发
204219
```
205220

206221
#### 使用 `uv`
207222

208223
[`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
209224

210225
```bash
211-
uv sync --extra dev --extra flash_attn
226+
uv sync --extra vllm --extra dev --extra flash_attn
227+
228+
# 如果没有GPU,可以改为使用Tinker:
229+
# uv sync --extra tinker --extra dev
212230
```
213231

214232
## 通过 PyPI 安装
783 KB
Loading

docs/sphinx_doc/source/tutorial/example_async_mode.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ You can run this example with the following command:
112112
bash examples/async_gsm8k/run.sh
113113
```
114114

115+
```{note}
116+
In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.
117+
```
118+
115119
The following plot shows the learning curve of GRPO in the asynchronous mode.
116120
> This result should be regarded merely as a baseline, since GRPO is supposed to be an on-policy algorithm.
117121
> We are continuously investigating other RL algorithms (e.g., [OPMD](./example_reasoning_advanced.md)) in the asynchronous mode.

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,9 +164,20 @@ model:
164164
max_response_tokens: 16384
165165
min_response_tokens: 1
166166
enable_prompt_truncation: true
167+
repetition_penalty: 1.0
168+
lora_configs: null
169+
rope_scaling: null
170+
rope_theta: null
171+
tinker:
172+
enable: false
173+
rank: 32
174+
seed: null
175+
train_mlp: true
176+
train_attn: true
177+
train_unembed: true
167178
```
168179

169-
- `model_path`: Path to the model being trained.
180+
- `model_path`: Path to the model being trained. If `tinker` is enabled, this is the path to the local tokenizer.
170181
- `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
171182
- `custom_chat_template`: Optional custom chat template in string format. If not specified, the system will use the default chat template from tokenizer.
172183
- `chat_template_path`: Optional path to the chat template file in jinja2 type; overrides `custom_chat_template` if set. If not specified, the system will use the default chat template from tokenizer.
@@ -175,6 +186,24 @@ model:
175186
- `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
176187
- `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
177188
- `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`. This function does not work with openai api mode.
189+
- `repetition_penalty`: Repetition penalty factor. Default is `1.0`.
190+
- `lora_configs`: Optional LoRA configuration. If not specified, defaults to `null`. Currently, only one LoRA configuration is supported, and this configuration will not be applied if `tinker` is enabled.
191+
- `name`: Name of the LoRA. Default is `None`.
192+
- `path`: Path to the LoRA. Default is `None`.
193+
- `base_model_name`: Name of the base model for LoRA. If not specified, defaults to `None`.
194+
- `lora_rank`: Rank of the LoRA. Default is `32`.
195+
- `lora_alpha`: Alpha value of the LoRA. Default is `32`.
196+
- `lora_dtype`: Data type of the LoRA. Default is `auto`.
197+
- `target_modules`: List of target modules for LoRA. Default is `all-linear`.
198+
- `rope_scaling`: Optional RoPE scaling configuration in JSON format. If not specified, defaults to `null`.
199+
- `rope_theta`: Optional RoPE theta value. If not specified, defaults to `null`.
200+
- `tinker`: Optional Tinker configuration. Note: LoRA configuration will be ignored if Tinker is enabled.
201+
- `enable`: Whether to enable Tinker. Default is `false`.
202+
- `rank`: LoRA rank controlling the size of adaptation matrices. Default is `32`.
203+
- `seed`: Random seed for Tinker. If not specified, defaults to `null`.
204+
- `train_mlp`: Whether to train the MLP layer. Default is `true`.
205+
- `train_attn`: Whether to train the attention layer. Default is `true`.
206+
- `train_unembed`: Whether to train the unembedding layer. Default is `true`.
178207

179208
```{tip}
180209
If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.

docs/sphinx_doc/source/tutorial/trinity_installation.md

Lines changed: 31 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,18 @@
33

44
For installing Trinity-RFT, you have three options: from source (recommended), via PyPI, or using Docker.
55

6-
Before installing, ensure your system meets the following requirements:
6+
**Before you begin**, check your system setup:
77

8-
- **Python**: Version 3.10 to 3.12 (inclusive)
9-
- **CUDA**: Version >= 12.8
10-
- **GPUs**: At least 2 GPUs
8+
### If you have GPUs and want to use them:
9+
Make sure your system meets these requirements:
10+
- **Python**: 3.10 – 3.12
11+
- **CUDA**: 12.8 or higher
12+
- **GPUs**: At least 2 available
13+
14+
### If you don’t have GPUs (or prefer not to use them):
15+
You can use the `tinker` option instead, which only requires:
16+
- **Python**: 3.11 – 3.12
17+
- **GPUs**: Not required
1118

1219
---
1320

@@ -32,10 +39,15 @@ Choose one of the following options:
3239
conda create -n trinity python=3.12
3340
conda activate trinity
3441

35-
pip install -e ".[dev]"
36-
pip install -e ".[flash_attn]"
37-
# if you encounter issues when installing flash-attn, try:
42+
pip install -e ".[vllm,flash_attn]"
43+
44+
# If you have no GPU, comment out the line above and uncomment this instead:
45+
# pip install -e ".[tinker]"
46+
47+
# If you encounter issues when installing flash-attn, try:
3848
# pip install flash-attn==2.8.1 --no-build-isolation
49+
50+
pip install -e ".[dev]" # for development like linting and debugging
3951
```
4052

4153
#### Using venv
@@ -44,18 +56,26 @@ pip install -e ".[flash_attn]"
4456
python3.10 -m venv .venv
4557
source .venv/bin/activate
4658

47-
pip install -e ".[dev]"
48-
pip install -e ".[flash_attn]"
49-
# if you encounter issues when installing flash-attn, try:
59+
pip install -e ".[vllm,flash_attn]"
60+
61+
# If you have no GPU, comment out the line above and uncomment this instead:
62+
# pip install -e ".[tinker]"
63+
64+
# If you encounter issues when installing flash-attn, try:
5065
# pip install flash-attn==2.8.1 --no-build-isolation
66+
67+
pip install -e ".[dev]" # for development like linting and debugging
5168
```
5269

5370
#### Using `uv`
5471

5572
[`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
5673

5774
```bash
58-
uv sync --extra dev --extra flash_attn
75+
uv sync --extra vllm --extra dev --extra flash_attn
76+
77+
# If you have no GPU, try to use Tinker instead:
78+
# uv sync --extra tinker --extra dev
5979
```
6080

6181
---

docs/sphinx_doc/source_zh/tutorial/example_async_mode.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,10 @@ trainer:
112112
bash examples/async_gsm8k/run.sh
113113
```
114114

115+
```{note}
116+
目前异步 RFT 训练中,最好需要先启动Trainer后启动Explorer,以避免在Explorer进程提前结束之后,Trainer读取不到生成的Experience数据。此问题将在未来的版本中解决。
117+
```
118+
115119
下图展示了 GRPO 在异步模式下的学习曲线:
116120
> 此结果仅应视为基线,因为 GRPO 本质上是一种 on-policy 算法。
117121
> 我们正在持续研究其他在异步模式下适用的强化学习算法(例如 [OPMD](./example_reasoning_advanced.md))。

docs/sphinx_doc/source_zh/tutorial/trinity_configs.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,9 +164,20 @@ model:
164164
max_response_tokens: 16384
165165
min_response_tokens: 1
166166
enable_prompt_truncation: true
167+
repetition_penalty: 1.0
168+
lora_configs: null
169+
rope_scaling: null
170+
rope_theta: null
171+
tinker:
172+
enable: false
173+
rank: 32
174+
seed: null
175+
train_mlp: true
176+
train_attn: true
177+
train_unembed: true
167178
```
168179

169-
- `model_path`: 被训练模型的路径。
180+
- `model_path`: 被训练模型的路径。如果启用了`tinker`,则该路径为本地 tokenizer 的路径。
170181
- `critic_model_path`: 可选的独立 critic 模型路径。若为空,则默认为 `model_path`。
171182
- `custom_chat_template`: 可选的自定义 chat template 字符串格式。若未指定,系统会使用 tokenizer 的默认 chat template。
172183
- `chat_template_path`: 可选的 chat template 文件路径,类型通常为 jinja2;若设置,则覆盖 `custom_chat_template`。若未指定,系统会使用 tokenizer 的默认 chat template。
@@ -175,6 +186,24 @@ model:
175186
- `max_response_tokens`: 模型生成的回复中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
176187
- `min_response_tokens`: 模型生成的回复中允许的最小 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
177188
- `enable_prompt_truncation`: 是否截断 prompt。默认为 `true`。若设置为 `true`,则 prompt 将被截断为 `max_prompt_tokens` 个 token;若设置为 `false`,则 prompt 不会被截断,存在 prompt 和 response 长度之和超过 `max_model_len` 的风险。在 OpenAI API 模式下不生效。
189+
- `repetition_penalty`:重复惩罚因子。默认值为 `1.0`。
190+
- `lora_configs`:可选的 LoRA 配置。若未指定,则默认为 `null`。目前仅支持一个 LoRA 配置,并且如果启用了`tinker`,则不会使用此LoRA配置。
191+
- `name`:LoRA 的名称。默认为 `None`。
192+
- `path`:LoRA 的路径。默认为 `None`。
193+
- `base_model_name`:LoRA 所基于的基础模型名称。若未指定,则默认为 `None`。
194+
- `lora_rank`:LoRA 的秩(rank)。默认为 `32`。
195+
- `lora_alpha`:LoRA 的 alpha 值。默认为 `32`。
196+
- `lora_dtype`:LoRA 的数据类型。默认为 `auto`。
197+
- `target_modules`:LoRA 的目标模块列表。默认为 `all-linear`。
198+
- `rope_scaling`:可选的 RoPE 缩放配置,采用 JSON 格式。若未指定,则默认为 `null`。
199+
- `rope_theta`:可选的 RoPE theta 值。若未指定,则默认为 `null`。
200+
- `tinker`:可选的 Tinker 配置。注意:若启用 Tinker,则 LoRA 配置将被忽略。
201+
- `enable`:是否启用 Tinker。默认为 `false`。
202+
- `rank`:控制适配矩阵大小的 LoRA 秩(rank)。默认为 `32`。
203+
- `seed`:Tinker 使用的随机种子。若未指定,则默认为 `null`。
204+
- `train_mlp`:是否训练 MLP 层。默认为 `true`。
205+
- `train_attn`:是否训练注意力层。默认为 `true`。
206+
- `train_unembed`:是否训练反嵌入(unembedding)层。默认为 `true`。
178207

179208
```{tip}
180209
如果使用的是 Explorer 提供的 openai API,则只有 `max_model_len` 会生效,而 `max_response_tokens`、`max_prompt_tokens` 和 `min_response_tokens` 的值将被忽略,在没有独立指定 `max_tokens` 时,每次 API 调用将生成最多 `max_model_len - prompt_length` 个 token,因此在使用时请确保 prompt 长度小于 `max_model_len`。

docs/sphinx_doc/source_zh/tutorial/trinity_installation.md

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,18 @@
33

44
安装 Trinity-RFT 有三种方式:源码安装(推荐)、通过 PyPI 安装,或使用 Docker。
55

6-
在安装前,请确保您的系统满足以下要求
6+
**开始之前**,请检查您的系统配置
77

8-
- **Python**:3.10 至 3.12(包含)
9-
- **CUDA**:大于等于 12.8
10-
- **GPU**:至少 2 块 GPU
8+
### 如果您拥有 GPU 并希望使用它们:
9+
请确保您的系统满足以下要求:
10+
- **Python**:3.10 – 3.12
11+
- **CUDA**:12.8 或更高版本
12+
- **GPU**:至少 2 块可用
13+
14+
### 如果您没有 GPU(或不希望使用 GPU):
15+
您可以改用 `tinker` 选项,该选项仅需满足:
16+
- **Python**:3.11 – 3.12
17+
- **GPU**:无需
1118

1219
---
1320

@@ -32,10 +39,15 @@ cd Trinity-RFT
3239
conda create -n trinity python=3.12
3340
conda activate trinity
3441

35-
pip install -e ".[dev]"
36-
pip install -e ".[flash_attn]"
42+
pip install -e ".[vllm,flash_attn]"
43+
44+
# 如果没有GPU,可以注释上一行的命令,改为使用Tinker:
45+
# pip install -e ".[tinker]"
46+
3747
# 如果安装 flash-attn 时遇到问题,可尝试:
3848
# pip install flash-attn==2.8.1 --no-build-isolation
49+
50+
pip install -e ".[dev]" # 用于调试和开发
3951
```
4052

4153
#### 使用 venv
@@ -44,18 +56,26 @@ pip install -e ".[flash_attn]"
4456
python3.10 -m venv .venv
4557
source .venv/bin/activate
4658

47-
pip install -e ".[dev]"
48-
pip install -e ".[flash_attn]"
59+
pip install -e ".[vllm,flash_attn]"
60+
61+
# 如果没有GPU,可以注释上一行的命令,改为使用Tinker:
62+
# pip install -e ".[tinker]"
63+
4964
# 如果安装 flash-attn 时遇到问题,可尝试:
5065
# pip install flash-attn==2.8.1 --no-build-isolation
66+
67+
pip install -e ".[dev]" # 用于调试和开发
5168
```
5269

5370
#### 使用 `uv`
5471

5572
[`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
5673

5774
```bash
58-
uv sync --extra dev --extra flash_attn
75+
uv sync --extra vllm --extra dev --extra flash_attn
76+
77+
# 如果没有GPU,可以改为使用Tinker:
78+
# uv sync --extra tinker --extra dev
5979
```
6080

6181
---

0 commit comments

Comments
 (0)