Skip to content

Commit 649b049

Browse files
committed
Refine
1 parent cd39f35 commit 649b049

File tree

1 file changed

+81
-33
lines changed

1 file changed

+81
-33
lines changed

README.md

Lines changed: 81 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,55 @@ It is designed to support diverse application scenarios and serve as a unified p
6262

6363
<p align="center">
6464
<img src="https://img.alicdn.com/imgextra/i2/O1CN01H3UbpF1yP7E1OCLbi_!!6000000006570-2-tps-1334-638.png" alt="Trinity-RFT">
65-
<em>Fig: The high-level design of Trinity-RFT</em>
65+
<em>Figure: The high-level design of Trinity-RFT</em>
6666
</p>
6767

6868

69+
<details>
70+
<summary>Figure: The architecture of RFT-core</summary>
71+
72+
73+
<p align="center">
74+
<img src="https://img.alicdn.com/imgextra/i1/O1CN01BFCZRV1zS9T1PoH49_!!6000000006712-2-tps-922-544.png" alt="Trinity-RFT-core-architecture">
75+
</p>
76+
77+
</details>
78+
79+
80+
<details>
81+
<summary>Figure: Some RFT modes supported by Trinity-RFT</summary>
82+
83+
<p align="center">
84+
<img src="https://img.alicdn.com/imgextra/i3/O1CN01E7NskS1FFoTI9jlaQ_!!6000000000458-2-tps-1458-682.png" alt="Trinity-RFT-modes">
85+
</p>
86+
87+
88+
</details>
89+
90+
91+
<details>
92+
<summary>Figure: The architecture of data processors</summary>
93+
94+
<p align="center">
95+
<img src="https://img.alicdn.com/imgextra/i3/O1CN01hR1LCh25kpJMKmYR4_!!6000000007565-2-tps-1474-740.png" alt="Trinity-RFT-data-pipeline-buffer">
96+
</p>
97+
98+
</details>
99+
100+
101+
<details>
102+
<summary>Figure: The high-level design of data pipelines in Trinity-RFT</summary>
103+
104+
<p align="center">
105+
<img src="https://img.alicdn.com/imgextra/i4/O1CN01UvyfcZ1WoTv5t3pCp_!!6000000002835-2-tps-1166-274.png" alt="Trinity-RFT-data-pipelines">
106+
</p>
107+
108+
</details>
109+
110+
111+
112+
113+
69114

70115
## 🛠️ What can I use Trinity-RFT for?
71116

@@ -84,23 +129,23 @@ It is designed to support diverse application scenarios and serve as a unified p
84129

85130
* **Low-Code Usage:**
86131

87-
Use graphical interfaces for easy monitoring and tracking of the learning process.
132+
Use graphical interfaces for easy monitoring and tracking of the learning process.
88133

89134

90135
---
91136

92137
## Table of contents
93138

139+
94140
- [Getting started](#getting-started)
95141
- [Step 1: preparations](#step-1-preparations)
96142
- [Step 2: prepare dataset and model](#step-2-prepare-dataset-and-model)
97143
- [Step 3: configurations](#step-3-configurations)
98144
- [Step 4: run the RFT process](#step-4-run-the-rft-process)
99-
- [Further examples](#further-examples)
145+
- [Further tutorials](#further-tutorials)
100146
- [Documentation](#documentation)
101147
- [Advanced usage and full configurations](#advanced-usage-and-full-configurations)
102148
- [Programming guide for developers](#programming-guide-for-developers)
103-
- [Details: design and implementations](#details-design-and-implementations)
104149
- [Upcoming features](#upcoming-features)
105150
- [Contribution guide](#contribution-guide)
106151
- [Acknowledgements](#acknowledgements)
@@ -109,6 +154,7 @@ It is designed to support diverse application scenarios and serve as a unified p
109154

110155

111156

157+
112158
## Getting started
113159

114160

@@ -119,7 +165,7 @@ It is designed to support diverse application scenarios and serve as a unified p
119165
### Step 1: preparations
120166

121167

122-
Installation from source (recommended):
168+
**Installation from source (recommended):**
123169

124170
```shell
125171
# Pull the source code from GitHub
@@ -151,13 +197,13 @@ pip install -e .\[flash_attn\]
151197
# pip install flash-attn -v --no-build-isolation
152198
```
153199

154-
Installation using pip:
200+
**Installation using pip:**
155201

156202
```shell
157203
pip install trinity-rft==0.2.0
158204
```
159205

160-
Installation from docker:
206+
**Installation from docker:**
161207
we have provided a dockerfile for Trinity-RFT (trinity)
162208

163209
```shell
@@ -174,7 +220,7 @@ docker run -it --gpus all --shm-size="64g" --rm -v $PWD:/workspace -v <root_path
174220
```
175221

176222

177-
Trinity-RFT requires
223+
**Requirements:**
178224
Python version >= 3.10,
179225
CUDA version >= 12.4,
180226
and at least 2 GPUs.
@@ -196,7 +242,7 @@ huggingface-cli download {model_name} --local-dir $MODEL_PATH/{model_name}
196242
modelscope download {model_name} --local_dir $MODEL_PATH/{model_name}
197243
```
198244

199-
For more details about model downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
245+
For more details about model downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli) or [ModelScope](https://modelscope.cn/docs/models/download).
200246

201247

202248

@@ -210,29 +256,31 @@ huggingface-cli download {dataset_name} --repo-type dataset --local-dir $DATASET
210256
modelscope download --dataset {dataset_name} --local_dir $DATASET_PATH/{dataset_name}
211257
```
212258

213-
For more details about dataset downloading, please refer to [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
259+
For more details about dataset downloading, see [Huggingface](https://huggingface.co/docs/huggingface_hub/main/en/guides/cli#download-a-dataset-or-a-space) or [ModelScope](https://modelscope.cn/docs/datasets/download).
214260

215261

216262

217263
### Step 3: configurations
218264

219265

220-
For convenience, Trinity-RFT provides a web interface for configuring your RFT process.
266+
Trinity-RFT provides a web interface for configuring your RFT process.
221267

222268
> [!NOTE]
223269
> This is an experimental feature, and we will continue to improve it.
224270
225271

226272
To enable *minimal* features (mainly for trainer), you can run
273+
227274
```bash
228275
trinity studio --port 8080
229276
```
277+
230278
Then you can configure your RFT process in the web page and generate a config file. You can save the config for later use or run it directly as described in the following section.
231279

232-
Advanced users can also configure the RFT process by editing the config file directly.
233-
We provide a set of example config files in [`examples`](examples/).
280+
Advanced users can also edit the config file directly.
281+
We provide example config files in [`examples`](examples/).
234282

235-
To enable *complete* visualization features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).
283+
For *complete* GUI features, please refer to the monorepo for [Trinity-Studio](https://github.com/modelscope/Trinity-Studio).
236284

237285

238286
<details>
@@ -250,7 +298,7 @@ To enable *complete* visualization features, please refer to the monorepo for [T
250298
### Step 4: run the RFT process
251299

252300

253-
First, start a ray cluster with the following command:
301+
Start a ray cluster:
254302

255303
```shell
256304
# On master node
@@ -260,35 +308,36 @@ ray start --head
260308
ray start --address=<master_address>
261309
```
262310

263-
Optionally, we can login into [wandb](https://docs.wandb.ai/quickstart/) to better monitor the RFT process:
311+
(Optional) Log in to [wandb](https://docs.wandb.ai/quickstart/) for better monitoring:
264312

265313
```shell
266314
export WANDB_API_KEY=<your_api_key>
267315
wandb login
268316
```
269317

270-
Then, for command-line users, run the RFT process with the following command:
318+
For command-line users, run the RFT process:
271319

272320
```shell
273321
trinity run --config <config_path>
274322
```
275323

276-
For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k dataset using GRPO algorithm:
324+
For example, below is the command for fine-tuning Qwen2.5-1.5B-Instruct on GSM8k with GRPO:
325+
277326
```shell
278327
trinity run --config examples/grpo_gsm8k/gsm8k.yaml
279328
```
280329

281-
For studio users, just click the "Run" button in the web page.
330+
For studio users, click "Run" in the web interface.
282331

283332

284333
## Further tutorials
285334

286335

287336
Tutorials for running different RFT modes:
288337

289-
+ [A quick example with GRPO and GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
290-
+ [Off-policy mode of RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
291-
+ [Fully asynchronous mode of RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
338+
+ [Quick example: GRPO on GSM8k](./docs/sphinx_doc/source/tutorial/example_reasoning_basic.md)
339+
+ [Off-policy RFT](./docs/sphinx_doc/source/tutorial/example_reasoning_advanced.md)
340+
+ [Fully asynchronous RFT](./docs/sphinx_doc/source/tutorial/example_async_mode.md)
292341
+ [Offline learning by DPO or SFT](./docs/sphinx_doc/source/tutorial/example_dpo.md)
293342

294343

@@ -321,8 +370,6 @@ Please refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_config
321370

322371

323372

324-
325-
326373
### Programming guide for developers
327374

328375

@@ -331,40 +378,41 @@ Please refer to [this document](./docs/sphinx_doc/source/tutorial/trinity_progra
331378

332379

333380

334-
### Details: design and implementations
381+
<!-- ### Details: design and implementations -->
335382

336383
<!--
384+
[TBC]
337385
**The architecture of RFT-core** is shown below, demonstrating the interplay between the explorer, buffer and trainer: -->
338386

339-
<p align="center">
387+
<!-- <p align="center">
340388
<img src="https://img.alicdn.com/imgextra/i1/O1CN01BFCZRV1zS9T1PoH49_!!6000000006712-2-tps-922-544.png" alt="Trinity-RFT-core-architecture">
341389
<em>Fig: The architecture of RFT-core</em>
342-
</p>
390+
</p> -->
343391

344392
<!-- ![](./docs/sphinx_doc/assets/trinity-architecture.png) -->
345393

346394

347-
<p align="center">
395+
<!-- <p align="center">
348396
<img src="https://img.alicdn.com/imgextra/i3/O1CN01E7NskS1FFoTI9jlaQ_!!6000000000458-2-tps-1458-682.png" alt="Trinity-RFT-modes">
349397
<em>Fig: Some RFT modes supported by Trinity-RFT</em>
350-
</p>
398+
</p> -->
351399

352400
<!-- ![](./docs/sphinx_doc/assets/trinity-mode.png) -->
353401

354402

355403

356-
<p align="center">
404+
<!-- <p align="center">
357405
<img src="https://img.alicdn.com/imgextra/i3/O1CN01hR1LCh25kpJMKmYR4_!!6000000007565-2-tps-1474-740.png" alt="Trinity-RFT-data-pipeline-buffer">
358406
<em>Fig: The architecture of data processors</em>
359-
</p>
407+
</p> -->
360408

361409
<!-- ![](./docs/sphinx_doc/assets/trinity-data-pipeline-buffer.png) -->
362410

363411

364-
<p align="center">
412+
<!-- <p align="center">
365413
<img src="https://img.alicdn.com/imgextra/i4/O1CN01UvyfcZ1WoTv5t3pCp_!!6000000002835-2-tps-1166-274.png" alt="Trinity-RFT-data-pipelines">
366414
<em>Fig: The high-level design of data pipelines in Trinity-RFT</em>
367-
</p>
415+
</p> -->
368416

369417

370418

0 commit comments

Comments
 (0)