You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: scripts/train/README.md
+16-5Lines changed: 16 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,11 +73,22 @@ Here we explain the some technical details on our data.
73
73
}
74
74
```
75
75
76
-
- single-image stage data mixture [TBD]
76
+
- single-image stage data mixture
77
+
78
+
We have placed our single-image stage data in [single-image-yaml](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/single_image.yaml) for users to review. You can download each subset from [onevision-data](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-Data).
79
+
80
+
Inside the data yaml, the first indicates the previous llava-1.6/next 790K data, you can download them in [llava-next-data](https://huggingface.co/datasets/lmms-lab/LLaVA-NeXT-Data).
81
+
82
+
Inside the yaml, the naming would be different with our paper figure due to writing consideration. For users who need to explore our dataset, you can check the [upload script](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/0070d0ae4931c9b19d9cc57c38e16a87c270a61c/playground/upload_data.py#L175) to find the mapping from our local dataset to HF's version.
83
+
77
84
- onevision stage data mixture
78
85
79
-
- Around 800K higher-quality data re-sampled from previous stage (yes, it's data replay!).
86
+
Our onevision stage data is available in [onevision-yaml](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/train/onevision.yaml). The single-image portion can be downloaded from the above Huggingface link for onevision data. Here's a breakdown of each part:
87
+
88
+
- Around 800K higher-quality data re-sampled from the previous stage (yes, it's data replay!).
- 65595 re-annotated data. The data sources are from a collection of academic datasets, including Youcook2 (32267), Charades (19851), NextQA (7653), activitynet (5153), ego4d (671). The instruction and response are generated via GPT4o provided by AzureAI. More exquisite details are to be completed by Yuanhan's subsequent work on video specific model to introduce the data annotation pipeline. (it's brilliant, stay tuned!)
83
-
- [ShareGPTVideo](https://huggingface.co/ShareGPTVideo). We use a total of 255000 data from it.
90
+
- Video Data: We have released the video part along with [llava-video-data](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K). Users can download the data, and we utilize the subset used in LLaVA-OneVision:
91
+
- We have included captions and open-ended questions in the 0_30_s_academic_v0_1 split, along with 240,000 open-ended QA items and 15,000 caption entries, as part of the video data in LLaVA-Hound for LLaVA-OneVision.
0 commit comments