You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/trinity_configs.md
+32-1Lines changed: 32 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -164,9 +164,21 @@ model:
164
164
max_response_tokens: 16384
165
165
min_response_tokens: 1
166
166
enable_prompt_truncation: true
167
+
repetition_penalty: 1.0
168
+
lora_configs: null
169
+
rope_scaling: null
170
+
rope_theta: null
171
+
tinker:
172
+
enable: false
173
+
base_model: null
174
+
rank: 32
175
+
seed: null
176
+
train_mlp: true
177
+
train_attn: true
178
+
train_unembed: true
167
179
```
168
180
169
-
- `model_path`: Path to the model being trained.
181
+
- `model_path`: Path to the model being trained. If `tinker` is enabled, this is the path to the local tokenizer.
170
182
- `critic_model_path`: Optional path to a separate critic model. If empty, defaults to `model_path`.
171
183
- `custom_chat_template`: Optional custom chat template in string format. If not specified, the system will use the default chat template from tokenizer.
172
184
- `chat_template_path`: Optional path to the chat template file in jinja2 type; overrides `custom_chat_template` if set. If not specified, the system will use the default chat template from tokenizer.
@@ -175,6 +187,25 @@ model:
175
187
- `max_prompt_tokens`: Maximum number of tokens allowed in prompts. Only for `chat` and `generate` methods in `InferenceModel`.
176
188
- `min_response_tokens`: Minimum number of tokens allowed in generated responses. Only for `chat` and `generate` methods in `InferenceModel`. Default is `1`. It must be less than `max_response_tokens`.
177
189
- `enable_prompt_truncation`: Whether to truncate the prompt. Default is `true`. If set to `true`, the prompt will be truncated to `max_prompt_tokens` tokens; if set to `false`, the prompt will not be truncated and there is a risk that the prompt length plus response length exceeds `max_model_len`. This function does not work with openai api mode.
190
+
- `repetition_penalty`: Repetition penalty factor. Default is `1.0`.
191
+
- `lora_configs`: Optional LoRA configuration. If not specified, defaults to `null`. Currently, only one LoRA configuration is supported.
192
+
- `name`: Name of the LoRA. Default is `None`.
193
+
- `path`: Path to the LoRA. Default is `None`.
194
+
- `base_model_name`: Name of the base model for LoRA. If not specified, defaults to `None`.
195
+
- `lora_rank`: Rank of the LoRA. Default is `32`.
196
+
- `lora_alpha`: Alpha value of the LoRA. Default is `32`.
197
+
- `lora_dtype`: Data type of the LoRA. Default is `auto`.
198
+
- `target_modules`: List of target modules for LoRA. Default is `all-linear`.
199
+
- `rope_scaling`: Optional RoPE scaling configuration in JSON format. If not specified, defaults to `null`.
200
+
- `rope_theta`: Optional RoPE theta value. If not specified, defaults to `null`.
201
+
- `tinker`: Optional Tinker configuration. Note: LoRA configuration will be ignored if Tinker is enabled.
202
+
- `enable`: Whether to enable Tinker. Default is `false`.
203
+
- `base_model`: Path to the base model for Tinker. If not specified, defaults to `model_path`.
204
+
- `rank`: LoRA rank controlling the size of adaptation matrices. Default is `32`.
205
+
- `seed`: Random seed for Tinker. If not specified, defaults to `null`.
206
+
- `train_mlp`: Whether to train the MLP layer. Default is `true`.
207
+
- `train_attn`: Whether to train the attention layer. Default is `true`.
208
+
- `train_unembed`: Whether to train the unembedding layer. Default is `true`.
178
209
179
210
```{tip}
180
211
If you are using the openai API provided by Explorer, only `max_model_len` will take effect, and the value of `max_response_tokens`, `max_prompt_tokens`, and `min_response_tokens` will be ignored. When `max_tokens` is not independently specified, each API call will generate up to `max_model_len - prompt_length` tokens. Therefore, please ensure that the prompt length is less than `max_model_len` when using the API.
This example demonstrates how to use Trinity with the [Tinker](https://thinkingmachines.ai/tinker/) backend, which enables model training on devices without GPUs.
4
+
5
+
## Setup Instructions
6
+
7
+
### 1. API Key Configuration
8
+
Before starting Ray, you must set the `TRINITY_API_KEY` environment variable to your Tinker API key to enable proper access to Tinker's API:
9
+
10
+
```bash
11
+
export TRINITY_API_KEY=your_tinker_api_key
12
+
```
13
+
14
+
### 2. Configuration File
15
+
Configure the Tinker backend in your YAML configuration file by setting the `model.tinker` parameters as shown below:
16
+
17
+
```yaml
18
+
model:
19
+
tinker:
20
+
enable: true
21
+
base_model: null
22
+
rank: 32
23
+
seed: null
24
+
train_mlp: true
25
+
train_attn: true
26
+
train_unembed: true
27
+
```
28
+
29
+
### 3. Configuration Parameters Explained
30
+
31
+
- **`tinker`**: Optional Tinker-specific configuration section. **Important**: When Tinker is enabled, any LoRA configuration settings will be ignored.
32
+
- **`enable`**: Whether to activate the Tinker backend. Default: `false`
33
+
- **`base_model`**: Path to the base model for Tinker. If not specified (`null`), it defaults to the `model_path` defined elsewhere in your config
34
+
- **`rank`**: The LoRA rank that controls the size of the adaptation matrices. Default: `32`
35
+
- **`seed`**: Random seed for reproducible Tinker operations. If not specified (`null`), no specific seed is set
36
+
- **`train_mlp`**: Whether to train the MLP (feed-forward) layers. Default: `true`
37
+
- **`train_attn`**: Whether to train the attention layers. Default: `true`
38
+
- **`train_unembed`**: Whether to train the unembedding (output) layer. Default: `true`
39
+
40
+
## Usage Notes
41
+
42
+
Once configured, Trinity works with the Tinker backend just like it does with the standard veRL training backend, with two important limitations:
43
+
1. **Entropy loss** is not consistent compared to veRL backends
44
+
2. Algorithms that require **`compute_advantage_in_trainer=true`** are **not supported**
45
+
46
+
The complete configuration file can be found at [`tinker.yaml`](tinker.yaml).
0 commit comments