You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**LLaVA-OneVision1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
86
+
**LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
87
87
88
88
-**Superior Performance**
89
89
A family of fully open-source large multimodal models demonstrating
@@ -96,8 +96,8 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
96
96
- Comprehensive instruction fine-tuning data covering a wide range of tasks
97
97
98
98
-**Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
99
-
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
100
-
- Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
99
+
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU hour)
100
+
- Built on **Megatron-LM** with support for **MoE**, **FP8**, and **long sequence parallelization**
101
101
- Optimized codebase for cost-effective scaling
102
102
103
103
@@ -112,23 +112,23 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
To improve model training efficiency, we implement offline sample packing:
346
346
347
347
1. Download the [**Mid-Training-85M Dataset**](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M)
348
-
2. Pack the data into webdataset format, refer to [**Examples offlinepacking**](examples_offline_packing) and [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
348
+
2. Pack the data into WebDataset format, refer to [**Examples offlinepacking**](examples_offline_packing) and [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
349
349
350
350
351
351
### Instruct
352
-
1. Download the [**LLaVA-OneVision-1.5-Insturct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data)
353
-
2. Convert the data into webdataset format, refer to [**Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
352
+
1. Download the [**LLaVA-OneVision-1.5-Instruct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Instruct-Data)
353
+
2. Convert the data into WebDataset format, refer to [**Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
354
354
355
-
## Roadmaps
355
+
## Roadmap
356
356
357
357
Q4 2025 Key Deliverables:
358
358
@@ -445,7 +445,7 @@ If you find *LLaVA-OneVision-1.5* useful in your research, please consider to ci
445
445
@inproceedings{LLaVA-OneVision-1.5,
446
446
title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training},
447
447
author={An, Xiang and Xie, Yin and Yang, Kaicheng and Zhang, Wenkang and Zhao, Xiuwei and Cheng, Zheng and Wang, Yirui and Xu, Songcen and Chen, Changrui and Wu, Chunsheng and Tan, Huajie and Li, Chunyuan and Yang, Jing and Yu, Jie and Wang, Xiyao and Qin, Bin and Wang, Yumeng and Yan, Zizhen and Feng, Ziyong and Liu, Ziwei and Li, Bo and Deng, Jiankang},
0 commit comments