You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Customization/Custom-dataset.md
+18-4Lines changed: 18 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ There are three ways to integrate a custom dataset, with increasing control over
33
33
34
34
## Recommended Dataset Format
35
35
36
-
Here is the recommended dataset format for ms-swift:
36
+
The following provides the recommended dataset format for ms-swift, where the system field is optional and defaults to the `default_system` defined in the template.
37
37
38
38
### Pre-training
39
39
@@ -68,11 +68,25 @@ Here is the recommended dataset format for ms-swift:
68
68
69
69
### Multimodal
70
70
71
-
For multimodal datasets, the format is the same as above. The difference is that it includes the keys `images`, `videos`, and `audios`, which represent multimodal resources:
71
+
For multimodal datasets, the format is the same as the tasks mentioned above. The difference is the addition of several keys: `images`, `videos`, and `audios`, which represent multimodal resources. The tags `<image>`, `<video>`, and `<audio>` indicate the positions where images, videos, and audio are inserted, respectively. The four examples provided below demonstrate the data format for pure text, as well as formats that include image, video, and audio data.
72
+
73
+
74
+
Pre-training:
75
+
```jsonl
76
+
{"messages": [{"role": "assistant", "content": "Pre-trained text goes here"}]}
77
+
{"messages": [{"role": "assistant", "content": "<image> is a puppy, <image> is a kitten"}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}
78
+
{"messages": [{"role": "assistant", "content": "<audio> describes how nice the weather is today"}], "audios": ["/xxx/x.wav"]}
79
+
{"messages": [{"role": "assistant", "content": "<image> is an elephant, <video> is a lion running"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
80
+
```
81
+
82
+
Supervised Fine-tuning:
83
+
72
84
```jsonl
73
-
{"messages": [{"role": "system", "content": "You are a useful and harmless assistant"}, {"role": "user", "content": "<image> What is in the image? <video> What is in the video?"}, {"role": "assistant", "content": "An elephant and a lion"}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
85
+
{"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "xxx/x.png"]}
86
+
{"messages": [{"role": "user", "content": "<audio>What did the audio say?"}, {"role": "assistant", "content": "The weather is really nice today."}], "audios": ["/xxx/x.mp3"]}
87
+
{"messages": [{"role": "system", "content": "You are a helpful and harmless assistant."}, {"role": "user", "content": "<image>What is in the image, <video>What is in the video?"}, {"role": "assistant", "content": "The image shows an elephant, and the video shows a puppy running on the grass."}], "images": ["/xxx/x.jpg"], "videos": ["/xxx/x.mp4"]}
74
88
```
75
-
The `<image>`, `<video>`, and `<audio>` tags indicate where to insert images/videos/audios.
89
+
The data format for RLHF can refer to the format used for pure text large models.
0 commit comments