Skip to content

Latest commit

Β 

History

History
69 lines (54 loc) Β· 4.16 KB

File metadata and controls

69 lines (54 loc) Β· 4.16 KB

Data preparation

We also provide the processed data as follows. The link is to BaiDu Disk.

Data groupUsageLink
LLaVA-PTStage 1LLaVA 1.5-558k
Hybird-FTStage 2SViT-157k, LVIS-220k, LRV-331k, MIMIC-IT-256k
LLaVA-FTStage 3LLaVA 1.5-mix-665k

For those who can not easily access to BaiDu Disk, you can download data from Hugging Face.

After downloading all of them, organize the data as follows in IMAGE_FOLDER.

IMAGE_FOLDER
β”œβ”€β”€ llava_image
β”œβ”€β”€ llava_image_tune
β”œβ”€β”€ lvis_tune
β”œβ”€β”€ lrv_tune
β”œβ”€β”€ svit_tune
└── mimicit_tune
    └── LA

Training

Specify your IMAGE_FOLDER and JSON_FOLDER according to the data preparation.

For training on 384 resolution, we use google/siglip-so400m-patch14-384 as image_tower. Notably, if you pass the --image_tower google/siglip-so400m-patch14-384, you should upgrade the version of transformers to 4.37.0.

Qwen

Phi2

StableLM

OpenChat