You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For more details about model downloading, please refer to[Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
245
+
For more details about model downloading, see[Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
For more details about dataset downloading, please refer to[Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
259
+
For more details about dataset downloading, see[Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
214
260
215
261
216
262
217
263
### Step 3: configurations
218
264
219
265
220
-
For convenience, Trinity-RFT provides a web interface for configuring your RFT process.
266
+
Trinity-RFT provides a web interface for configuring your RFT process.
221
267
222
268
> [!NOTE]
223
269
> This is an experimental feature, and we will continue to improve it.
224
270
225
271
226
272
To enable *minimal* features (mainly for trainer), you can run
273
+
227
274
```bash
228
275
trinity studio --port 8080
229
276
```
277
+
230
278
Then you can configure your RFT process in the web page and generate a config file. You can save the config for later use or run it directly as described in the following section.
231
279
232
-
Advanced users can also configure the RFT process by editing the config file directly.
233
-
We provide a set of example config files in [`examples`](examples/).
280
+
Advanced users can also edit the config file directly.
281
+
We provide example config files in [`examples`](examples/).
234
282
235
-
To enable *complete*visualization features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).
283
+
For *complete*GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).
236
284
237
285
238
286
<details>
@@ -250,7 +298,7 @@ To enable *complete* visualization features, please refer to the monorepo for [T
250
298
### Step 4: run the RFT process
251
299
252
300
253
-
First, start a ray cluster with the following command:
301
+
Start a ray cluster:
254
302
255
303
```shell
256
304
# On master node
@@ -260,35 +308,36 @@ ray start --head
260
308
ray start --address=<master_address>
261
309
```
262
310
263
-
Optionally, we can login into [wandb](https://docs.wandb.ai/quickstart/)to better monitor the RFT process:
311
+
(Optional) Log in to [wandb](https://docs.wandb.ai/quickstart/)for better monitoring:
264
312
265
313
```shell
266
314
export WANDB_API_KEY=<your_api_key>
267
315
wandb login
268
316
```
269
317
270
-
Then, for command-line users, run the RFT process with the following command:
318
+
For command-line users, run the RFT process:
271
319
272
320
```shell
273
321
trinity run --config <config_path>
274
322
```
275
323
276
-
For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:
324
+
For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
325
+
277
326
```shell
278
327
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
279
328
```
280
329
281
-
For studio users, just click the "Run" button in the web page.
330
+
For studio users, click "Run" in the web interface.
282
331
283
332
284
333
## Further tutorials
285
334
286
335
287
336
Tutorials for running different RFT modes:
288
337
289
-
+[A quick example with GRPO and GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
290
-
+[Off-policy mode of RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
291
-
+[Fully asynchronous mode of RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
338
+
+[Quick example: GRPO on GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
0 commit comments