You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
42
42
43
43
## 🚀 News
44
44
45
+
*[2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
45
46
*[2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
*[2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data ([paper](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
154
155
155
156
> [!NOTE]
156
157
> This project is currently under active development. Comments and suggestions are welcome!
158
+
>
159
+
> **No GPU? No problem!** You can still try it out:
160
+
> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
161
+
> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
Copy file name to clipboardExpand all lines: examples/tinker/README.md
+10-13Lines changed: 10 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,7 @@
1
1
# Trinity with Tinker Backend
2
2
3
-
This example demonstrates how to use Trinity with the [Tinker](https://thinkingmachines.ai/tinker/) backend, which enables model training on devices without GPUs.
3
+
> [!NOTE]
4
+
> This example demonstrates how to use Trinity with the [Tinker](https://thinkingmachines.ai/tinker/) backend, which enables model training on devices **without GPUs**.
4
5
5
6
## Setup Instructions
6
7
@@ -28,7 +29,7 @@ model:
28
29
29
30
### 3. Configuration Parameters Explained
30
31
31
-
- **`tinker`**: Tinker-specific configuration section. **Important**: When Tinker is enabled, any LoRA configuration settings will be ignored.
32
+
- **`tinker`**: Tinker-specific configuration section. **Important**: When Tinker is enabled, any LoRA configuration settings (`model.lora_configs`) will be ignored.
32
33
- **`enable`**: Whether to activate the Tinker backend. Default: `false`
33
34
- **`base_model`**: Path to the base model for Tinker. If not specified (`null`), it defaults to the `model_path` defined elsewhere in your config
34
35
- **`rank`**: The LoRA rank that controls the size of the adaptation matrices. Default: `32`
@@ -50,10 +51,12 @@ trinity run --config tinker.yaml # Replace with your actual config file path
50
51
51
52
1. **Entropy loss** is not consistent compared to veRL backends.
52
53
2. **Algorithms requiring `compute_advantage_in_trainer=true` are NOT supported**, including:
Algorithms like `algorithm.algorithm_type=grpo` are supported.
57
60
58
61
> 💡 A complete example configuration file is available at [`tinker.yaml`](tinker.yaml).
59
62
@@ -197,10 +200,6 @@ buffer:
197
200
storage_type: queue
198
201
replay_buffer:
199
202
enable: false
200
-
priority_fn: linear_decay
201
-
reuse_cooldown_time: null
202
-
priority_fn_args:
203
-
decay: 2.0
204
203
explorer:
205
204
runner_per_model: 16
206
205
rollout_model:
@@ -240,8 +239,6 @@ synchronizer:
240
239
241
240
### Observations
242
241
243
-
Since **Llama-3.2-3B** is a base (non-instruct-tuned) model, it has limited ability to follow formatting instructions. Additionally, we trained for only **one epoch**. As a result, both backends achieved final rewards just slightly above **0.1**.
244
-
245
-
However, the training curves clearly show an **upward trend in reward**, indicating successful learning. The results are visualized below:
242
+
Since Llama-3.2-3B is a base (non-instruct-tuned) model, it has limited ability to follow formatting instructions. Additionally, we trained for only **one epoch**. As a result, both backends achieved final rewards just slightly above 0.1. Nonetheless, the training curves show a clear upward trend in reward, indicating successful learning. The results are visualized below:
246
243
247
244

0 commit comments