Skip to content

Commit 2e2adda

Browse files
authored
Update README.md
1 parent 84dbe03 commit 2e2adda

File tree

1 file changed

+15
-21
lines changed

1 file changed

+15
-21
lines changed

README.md

Lines changed: 15 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -86,27 +86,21 @@
8686
## Introduction
8787
**LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
8888

89-
- **Superior Performance**
90-
A family of fully open-source large multimodal models demonstrating
91-
- Superior performance across multiple multimodal benchmarks
92-
- outperforming **Qwen2.5-VL** in most evaluation tasks.
93-
94-
- **High-Quality Data at Scale**
95-
Meticulously curated **pre-training and SFT data** with rigorous filtering and quality control.
96-
- Concept-balanced, highly diverse, high-quality caption data
97-
- Comprehensive instruction fine-tuning data covering a wide range of tasks
98-
99-
- **Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
100-
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU hour)
101-
- Built on **Megatron-LM** with support for **MoE**, **FP8**, and **long sequence parallelization**
102-
- Optimized codebase for cost-effective scaling
103-
104-
105-
- **Fully Open Framework** for community access and reproducibility:
106-
- High-quality pre-training & SFT data
107-
- Complete training framework & code
108-
- Training recipes & configurations
109-
- Comprehensive training logs & metrics
89+
#### **Superior Performance**
90+
- The model leads on multiple multimodal benchmarks and generally surpasses Qwen2.5-VL.
91+
- Training on native-resolution images significantly improves its visual understanding.
92+
93+
#### **High-Quality Data at Scale**
94+
- The pretraining corpus comprises large-scale, concept-balanced, diverse, and high-quality captions curated with strict filtering and quality control.
95+
- The instruction-tuning dataset is comprehensive and covers a wide range of tasks.
96+
97+
#### **Ultra-Efficient Training Framework**
98+
- The end-to-end training cost is about $16,000 on A100 GPUs at roughly $0.60 per GPU-hour.
99+
- The system is built on Megatron-LM with support for MoE, FP8, and long-sequence parallelism, and the codebase is optimized for cost-effective scaling.
100+
101+
#### **Fully Open Framework**
102+
- The project releases high-quality pretraining and SFT datasets along with the complete training framework, configurations, and recipes.
103+
- It also provides detailed training logs and metrics to enable reproducibility and community adoption.
110104

111105

112106
## Models

0 commit comments

Comments
 (0)