Skip to content

Commit cd223e3

Browse files
committed
updated
1 parent b673295 commit cd223e3

File tree

1 file changed

+11
-5
lines changed

1 file changed

+11
-5
lines changed

README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -289,15 +289,21 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch \
289289

290290
## Fully Reproducing Guide
291291

292-
### 1. Data Processing
292+
> [!TIP]
293+
> More detailed reproduction steps for the complete process will be provided after the dataset upload is completed.
294+
295+
296+
### Mid-Training
297+
293298
To improve model training efficiency, we implement offline sample packing:
294299

295-
1. Download the [lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M)
296-
2. Pack the mid-training data into webdataset format, For detailed instructions, refer to [examples/llava_ov_1_5/sample_packing/README.md](examples/llava_ov_1_5/sample_packing/README.md)
300+
1. Download the [**Mid-Training-85M Dataset**](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M)
301+
2. Pack the mid-training data into webdataset format, For detailed instructions, refer to [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
297302

298303

299-
### 2. Training
300-
More detailed reproduction steps for the complete process will be provided after the dataset upload is completed.
304+
### Instruct
305+
1. Download the [**LLaVA-OneVision-1.5-Insturct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data)
306+
2. Convert the instruct data into webdataset format, For detailed instructions, refer to [**WebDataset Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
301307

302308
## Roadmaps
303309

0 commit comments

Comments
 (0)