modelscope
diff --git a/‎README.md‎
Lines changed: 25 additions & 7 deletions b/‎README.md‎
Lines changed: 25 additions & 7 deletions
diff --git a/‎README_zh.md‎
Lines changed: 23 additions & 5 deletions b/‎README_zh.md‎
Lines changed: 23 additions & 5 deletions
diff --git a/‎docs/sphinx_doc/assets/tinker-gsm8k.png‎
783 KB b/‎docs/sphinx_doc/assets/tinker-gsm8k.png‎
783 KB
diff --git a/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/sphinx_doc/source/tutorial/example_async_mode.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/sphinx_doc/source/tutorial/trinity_configs.md‎
Lines changed: 30 additions & 1 deletion b/‎docs/sphinx_doc/source/tutorial/trinity_configs.md‎
Lines changed: 30 additions & 1 deletion
diff --git a/‎docs/sphinx_doc/source/tutorial/trinity_installation.md‎
Lines changed: 31 additions & 11 deletions b/‎docs/sphinx_doc/source/tutorial/trinity_installation.md‎
Lines changed: 31 additions & 11 deletions
diff --git a/‎docs/sphinx_doc/source_zh/tutorial/example_async_mode.md‎
Lines changed: 4 additions & 0 deletions b/‎docs/sphinx_doc/source_zh/tutorial/example_async_mode.md‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎docs/sphinx_doc/source_zh/tutorial/trinity_configs.md‎
Lines changed: 30 additions & 1 deletion b/‎docs/sphinx_doc/source_zh/tutorial/trinity_configs.md‎
Lines changed: 30 additions & 1 deletion
diff --git a/‎docs/sphinx_doc/source_zh/tutorial/trinity_installation.md‎
Lines changed: 29 additions & 9 deletions b/‎docs/sphinx_doc/source_zh/tutorial/trinity_installation.md‎
Lines changed: 29 additions & 9 deletions
@@ -42,6 +42,7 @@ Trinity-RFT provides functionalities for users with different backgrounds and ob
 
 ## 🚀 News
 
+* [2025-12] Trinity-RFT has supported [tinker](https://thinkingmachines.ai/tinker/) training backend, which enables model training on devices **without GPUs**.
 * [2025-12] Trinity-RFT powers the medical and health business of "Taobao Shangou", enabling the AI agent to understand vague symptoms, proactively ask follow-up questions, and provide precise recommendations ([News](https://tech.china.com.cn/sx/20251201/411376.shtml)).
 * [2025-11] [[Release Notes](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 released: bug fixes.
 * [2025-11] Introducing [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask): a framework for training proactive dialogue agents from offline expert data  ([paper](https://arxiv.org/pdf/2510.25441)).
@@ -154,6 +155,10 @@ We list some algorithms supported by Trinity-RFT in the following table. For mor
 
 > [!NOTE]
 > This project is currently under active development. Comments and suggestions are welcome!
+>
+> **No GPU? No problem!** You can still try it out:
+> 1. Follow the installation steps (feel free to skip GPU-specific packages like `flash-attn`)
+> 2. Run the **[Tinker training example](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**, which is specifically designed to work on CPU-only systems.
 
 
 ### Step 1: installation
@@ -186,10 +191,15 @@ Choose one of the following options:
 conda create -n trinity python=3.12
 conda activate trinity
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
-# if you encounter issues when installing flash-attn, try:
+pip install -e ".[vllm,flash_attn]"
+
+# If you have no GPU, comment out the line above and uncomment this instead:
+# pip install -e ".[tinker]"
+
+# If you encounter issues when installing flash-attn, try:
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
 ###### Using venv
@@ -198,18 +208,26 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
-# if you encounter issues when installing flash-attn, try:
+pip install -e ".[vllm,flash_attn]"
+
+# If you have no GPU, comment out the line above and uncomment this instead:
+# pip install -e ".[tinker]"
+
+# If you encounter issues when installing flash-attn, try:
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
 ###### Using `uv`
 
 [`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
 
 ```bash
-uv sync --extra dev --extra flash_attn
+uv sync --extra vllm --extra dev --extra flash_attn
+
+# If you have no GPU, try to use Tinker instead:
+# uv sync --extra tinker --extra dev
 ```
 
 
 
@@ -41,6 +41,7 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 ## 🚀 新闻
 
+* [2025-12] Trinity-RFT 已支持 [tinker](https://thinkingmachines.ai/tinker/) 训练后端，可在**无 GPU 的设备**上进行模型训练。
 * [2025-12] Trinity-RFT 助力淘宝闪购医药健康业务，让 AI 智能体能够理解模糊症状、主动询问后续问题，并提供精准推荐（[新闻](https://tech.china.com.cn/sx/20251201/411376.shtml)）。
 * [2025-11] [[发布说明](https://github.com/modelscope/Trinity-RFT/releases/tag/v0.3.3)] Trinity-RFT v0.3.3 发布：修复若干 Bug。
 * [2025-11] 推出 [Learn-to-Ask](https://github.com/modelscope/Trinity-RFT/tree/main/examples/learn_to_ask)：利用离线专家数据，训练具备主动问询能力的对话智能体（[论文](https://arxiv.org/pdf/2510.25441)）.
@@ -154,6 +155,10 @@ Trinity-RFT 面向不同背景和目标的用户提供相应功能：
 
 > [!NOTE]
 > 本项目正处于活跃开发阶段。欢迎提出意见和建议！
+>
+> **没有 GPU？没问题！** 您仍然可以尝试使用：
+> 1. 按照安装步骤进行操作（可跳过 `flash-attn` 等 GPU 专用的软件包）
+> 2. 运行 **[Tinker 训练示例](https://github.com/modelscope/Trinity-RFT/tree/main/examples/tinker)**，该示例专为仅使用 CPU 的系统设计。
 
 
 ### 第一步：安装
@@ -185,10 +190,15 @@ cd Trinity-RFT
 conda create -n trinity python=3.12
 conda activate trinity
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
+pip install -e ".[vllm,flash_attn]"
+
+# 如果没有GPU，可以注释上一行的命令，改为使用Tinker：
+# pip install -e ".[tinker]"
+
 # 如果安装 flash-attn 时遇到问题，可尝试：
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # 用于调试和开发
 ```
 
 #### 使用 venv
@@ -197,18 +207,26 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
+pip install -e ".[vllm,flash_attn]"
+
+# 如果没有GPU，可以注释上一行的命令，改为使用Tinker：
+# pip install -e ".[tinker]"
+
 # 如果安装 flash-attn 时遇到问题，可尝试：
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # 用于调试和开发
 ```
 
 #### 使用 `uv`
 
 [`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
 
 ```bash
-uv sync --extra dev --extra flash_attn
+uv sync --extra vllm --extra dev --extra flash_attn
+
+# 如果没有GPU，可以改为使用Tinker：
+# uv sync --extra tinker --extra dev
 ```
 
 ## 通过 PyPI 安装
 
@@ -112,6 +112,10 @@ You can run this example with the following command:
 bash examples/async_gsm8k/run.sh
 ```
 
+```{note}
+In the current asynchronous RFT training, it is recommended to start the Trainer before starting the Explorer to avoid the situation where the Trainer cannot read the generated experience data after the Explorer process terminates prematurely. This issue will be resolved in a future version.
+```
+
 The following plot shows the learning curve of GRPO in the asynchronous mode.
 > This result should be regarded merely as a baseline, since GRPO is supposed to be an on-policy algorithm.
 > We are continuously investigating other RL algorithms (e.g., [OPMD](./example_reasoning_advanced.md)) in the asynchronous mode.
 
@@ -164,9 +164,20 @@ model:
   max_response_tokens: 16384
   min_response_tokens: 1
   enable_prompt_truncation: true
+  repetition_penalty: 1.0
+  lora_configs: null
+  rope_scaling: null
+  rope_theta: null
+  tinker:
+    enable: false
+    rank: 32
+    seed: null
+    train_mlp: true
+    train_attn: true
+    train_unembed: true
 ```
 
-- `model_path`: Path to the model being trained.
+- `model_path`: Path to the model being trained. If `tinker` is enabled, this is the path to the local tokenizer.
 - `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
 - `custom_chat_template`: Optional custom chat template in string format. If not specified, the system will use the default chat template from tokenizer.
 - `chat_template_path`: Optional path to the chat template file in jinja2 type; overrides `custom_chat_template` if set. If not specified, the system will use the default chat template from tokenizer.
@@ -175,6 +186,24 @@ model:
 - `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
 - `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
 - `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`. This function does not work with openai api mode.
+- `repetition_penalty`: Repetition penalty factor. Default is `1.0`.
+- `lora_configs`: Optional LoRA configuration. If not specified, defaults to `null`. Currently, only one LoRA configuration is supported, and this configuration will not be applied if `tinker` is enabled.
+  - `name`: Name of the LoRA. Default is `None`.
+  - `path`: Path to the LoRA. Default is `None`.
+  - `base_model_name`: Name of the base model for LoRA. If not specified, defaults to `None`.
+  - `lora_rank`: Rank of the LoRA. Default is `32`.
+  - `lora_alpha`: Alpha value of the LoRA. Default is `32`.
+  - `lora_dtype`: Data type of the LoRA. Default is `auto`.
+  - `target_modules`: List of target modules for LoRA. Default is `all-linear`.
+- `rope_scaling`: Optional RoPE scaling configuration in JSON format. If not specified, defaults to `null`.
+- `rope_theta`: Optional RoPE theta value. If not specified, defaults to `null`.
+- `tinker`: Optional Tinker configuration. Note: LoRA configuration will be ignored if Tinker is enabled.
+  - `enable`: Whether to enable Tinker. Default is `false`.
+  - `rank`: LoRA rank controlling the size of adaptation matrices. Default is `32`.
+  - `seed`: Random seed for Tinker. If not specified, defaults to `null`.
+  - `train_mlp`: Whether to train the MLP layer. Default is `true`.
+  - `train_attn`: Whether to train the attention layer. Default is `true`.
+  - `train_unembed`: Whether to train the unembedding layer. Default is `true`.
 
 ```{tip}
 If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.
 
@@ -3,11 +3,18 @@
 
 For installing Trinity-RFT, you have three options: from source (recommended), via PyPI, or using Docker.
 
-Before installing, ensure your system meets the following requirements:
+**Before you begin**, check your system setup:
 
-- **Python**: Version 3.10 to 3.12 (inclusive)
-- **CUDA**: Version >= 12.8
-- **GPUs**: At least 2 GPUs
+### If you have GPUs and want to use them:
+Make sure your system meets these requirements:
+- **Python**: 3.10 – 3.12
+- **CUDA**: 12.8 or higher
+- **GPUs**: At least 2 available
+
+### If you don’t have GPUs (or prefer not to use them):
+You can use the `tinker` option instead, which only requires:
+- **Python**: 3.11 – 3.12
+- **GPUs**: Not required
 
 ---
 
@@ -32,10 +39,15 @@ Choose one of the following options:
 conda create -n trinity python=3.12
 conda activate trinity
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
-# if you encounter issues when installing flash-attn, try:
+pip install -e ".[vllm,flash_attn]"
+
+# If you have no GPU, comment out the line above and uncomment this instead:
+# pip install -e ".[tinker]"
+
+# If you encounter issues when installing flash-attn, try:
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
 #### Using venv
@@ -44,18 +56,26 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
-# if you encounter issues when installing flash-attn, try:
+pip install -e ".[vllm,flash_attn]"
+
+# If you have no GPU, comment out the line above and uncomment this instead:
+# pip install -e ".[tinker]"
+
+# If you encounter issues when installing flash-attn, try:
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # for development like linting and debugging
 ```
 
 #### Using `uv`
 
 [`uv`](https://github.com/astral-sh/uv) is a modern Python package installer.
 
 ```bash
-uv sync --extra dev --extra flash_attn
+uv sync --extra vllm --extra dev --extra flash_attn
+
+# If you have no GPU, try to use Tinker instead:
+# uv sync --extra tinker --extra dev
 ```
 
 ---
 
@@ -112,6 +112,10 @@ trainer:
 bash examples/async_gsm8k/run.sh
 ```
 
+```{note}
+目前异步 RFT 训练中，最好需要先启动Trainer后启动Explorer，以避免在Explorer进程提前结束之后，Trainer读取不到生成的Experience数据。此问题将在未来的版本中解决。
+```
+
 下图展示了 GRPO 在异步模式下的学习曲线：
 > 此结果仅应视为基线，因为 GRPO 本质上是一种 on-policy 算法。
 > 我们正在持续研究其他在异步模式下适用的强化学习算法（例如 [OPMD](./example_reasoning_advanced.md)）。
 
@@ -164,9 +164,20 @@ model:
   max_response_tokens: 16384
   min_response_tokens: 1
   enable_prompt_truncation: true
+  repetition_penalty: 1.0
+  lora_configs: null
+  rope_scaling: null
+  rope_theta: null
+  tinker:
+    enable: false
+    rank: 32
+    seed: null
+    train_mlp: true
+    train_attn: true
+    train_unembed: true
 ```
 
-- `model_path`: 被训练模型的路径。
+- `model_path`: 被训练模型的路径。如果启用了`tinker`，则该路径为本地 tokenizer 的路径。
 - `critic_model_path`: 可选的独立 critic 模型路径。若为空，则默认为 `model_path`。
 - `custom_chat_template`: 可选的自定义 chat template 字符串格式。若未指定，系统会使用 tokenizer 的默认 chat template。
 - `chat_template_path`: 可选的 chat template 文件路径，类型通常为 jinja2；若设置，则覆盖 `custom_chat_template`。若未指定，系统会使用 tokenizer 的默认 chat template。
@@ -175,6 +186,24 @@ model:
 - `max_response_tokens`: 模型生成的回复中允许的最大 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
 - `min_response_tokens`: 模型生成的回复中允许的最小 token 数。仅对 `InferenceModel` 中的 `chat` 和 `generate` 方法生效。
 - `enable_prompt_truncation`: 是否截断 prompt。默认为 `true`。若设置为 `true`，则 prompt 将被截断为 `max_prompt_tokens` 个 token；若设置为 `false`，则 prompt 不会被截断，存在 prompt 和 response 长度之和超过 `max_model_len` 的风险。在 OpenAI API 模式下不生效。
+- `repetition_penalty`：重复惩罚因子。默认值为 `1.0`。
+- `lora_configs`：可选的 LoRA 配置。若未指定，则默认为 `null`。目前仅支持一个 LoRA 配置，并且如果启用了`tinker`，则不会使用此LoRA配置。
+  - `name`：LoRA 的名称。默认为 `None`。
+  - `path`：LoRA 的路径。默认为 `None`。
+  - `base_model_name`：LoRA 所基于的基础模型名称。若未指定，则默认为 `None`。
+  - `lora_rank`：LoRA 的秩（rank）。默认为 `32`。
+  - `lora_alpha`：LoRA 的 alpha 值。默认为 `32`。
+  - `lora_dtype`：LoRA 的数据类型。默认为 `auto`。
+  - `target_modules`：LoRA 的目标模块列表。默认为 `all-linear`。
+- `rope_scaling`：可选的 RoPE 缩放配置，采用 JSON 格式。若未指定，则默认为 `null`。
+- `rope_theta`：可选的 RoPE theta 值。若未指定，则默认为 `null`。
+- `tinker`：可选的 Tinker 配置。注意：若启用 Tinker，则 LoRA 配置将被忽略。
+  - `enable`：是否启用 Tinker。默认为 `false`。
+  - `rank`：控制适配矩阵大小的 LoRA 秩（rank）。默认为 `32`。
+  - `seed`：Tinker 使用的随机种子。若未指定，则默认为 `null`。
+  - `train_mlp`：是否训练 MLP 层。默认为 `true`。
+  - `train_attn`：是否训练注意力层。默认为 `true`。
+  - `train_unembed`：是否训练反嵌入（unembedding）层。默认为 `true`。
 
 ```{tip}
 如果使用的是 Explorer 提供的 openai API，则只有 `max_model_len` 会生效，而 `max_response_tokens`、`max_prompt_tokens` 和 `min_response_tokens` 的值将被忽略，在没有独立指定 `max_tokens` 时，每次 API 调用将生成最多 `max_model_len - prompt_length` 个 token，因此在使用时请确保 prompt 长度小于 `max_model_len`。
 
@@ -3,11 +3,18 @@
 
 安装 Trinity-RFT 有三种方式：源码安装（推荐）、通过 PyPI 安装，或使用 Docker。
 
-在安装前，请确保您的系统满足以下要求：
+**开始之前**，请检查您的系统配置：
 
-- **Python**：3.10 至 3.12（包含）
-- **CUDA**：大于等于 12.8
-- **GPU**：至少 2 块 GPU
+### 如果您拥有 GPU 并希望使用它们：
+请确保您的系统满足以下要求：
+- **Python**：3.10 – 3.12
+- **CUDA**：12.8 或更高版本
+- **GPU**：至少 2 块可用
+
+### 如果您没有 GPU（或不希望使用 GPU）：
+您可以改用 `tinker` 选项，该选项仅需满足：
+- **Python**：3.11 – 3.12
+- **GPU**：无需
 
 ---
 
@@ -32,10 +39,15 @@ cd Trinity-RFT
 conda create -n trinity python=3.12
 conda activate trinity
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
+pip install -e ".[vllm,flash_attn]"
+
+# 如果没有GPU，可以注释上一行的命令，改为使用Tinker：
+# pip install -e ".[tinker]"
+
 # 如果安装 flash-attn 时遇到问题，可尝试：
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # 用于调试和开发
 ```
 
 #### 使用 venv
@@ -44,18 +56,26 @@ pip install -e ".[flash_attn]"
 python3.10 -m venv .venv
 source .venv/bin/activate
 
-pip install -e ".[dev]"
-pip install -e ".[flash_attn]"
+pip install -e ".[vllm,flash_attn]"
+
+# 如果没有GPU，可以注释上一行的命令，改为使用Tinker：
+# pip install -e ".[tinker]"
+
 # 如果安装 flash-attn 时遇到问题，可尝试：
 # pip install flash-attn==2.8.1 --no-build-isolation
+
+pip install -e ".[dev]"  # 用于调试和开发
 ```
 
 #### 使用 `uv`
 
 [`uv`](https://github.com/astral-sh/uv) 是现代的 Python 包管理工具。
 
 ```bash
-uv sync --extra dev --extra flash_attn
+uv sync --extra vllm --extra dev --extra flash_attn
+
+# 如果没有GPU，可以改为使用Tinker：
+# uv sync --extra tinker --extra dev
 ```
 
 ---