You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/sphinx_doc/source/tutorial/example_dpo.md
+50-10Lines changed: 50 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
-
# Offline DPO
1
+
# Offline DPO and SFT
2
2
3
-
This example describes DPO based on the Qwen-2.5-1.5B-Instruct model and [Human-like-DPO-dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset).
3
+
This example describes DPO and SFT based on the Qwen-2.5-1.5B-Instruct model.
4
4
5
5
## Step 1: Model and Data Preparation
6
6
@@ -20,7 +20,7 @@ More details of model downloading are referred to [ModelScope](https://modelscop
20
20
21
21
### Data Preparation
22
22
23
-
Download the Human-Like-DPO-Dataset dataset to the local directory `$DATASET_PATH/human_like_dpo_dataset`:
23
+
For DPO, we download the [Human-like-DPO-dataset](https://huggingface.co/datasets/HumanLLMs/Human-Like-DPO-Dataset) to the local directory `$DATASET_PATH/human_like_dpo_dataset`:
24
24
25
25
```shell
26
26
# Using Modelscope
@@ -34,9 +34,11 @@ More details of dataset downloading are referred to [ModelScope](https://modelsc
34
34
35
35
Note that the dataset has the keys `prompt`, `chosen` and `rejected`. If not, pass the proper keys to the config.
36
36
37
-
## Step 2: Setup Configuration and Run Experiment
37
+
For SFT, we download the dataset to the local directory `/PATH/TO/SFT_DATASET/`, which usually contains message-based data.
38
38
39
-
### Configuration
39
+
## Step 2: Setup Configuration
40
+
41
+
### Configuration for DPO
40
42
41
43
We use the configurations in [`dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/dpo.yaml) and [`train_dpo.yaml`](https://github.com/modelscope/Trinity-RFT/tree/main/examples/dpo_humanlike/train_dpo.yaml) for this experiment. Some important setups are listed in the following:
0 commit comments