Skip to content

Commit bf92c00

Browse files
authored
Update README.md
1 parent 98ffc2a commit bf92c00

File tree

1 file changed

+20
-20
lines changed

1 file changed

+20
-20
lines changed

README.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
<picture>
33
<source media="(prefers-color-scheme: dark)" srcset="asset/llava_onevision_black.png">
44
<source media="(prefers-color-scheme: light)" srcset="asset/llava_onevision_white.png">
5-
<img alt="LLaVA-OneVision 1.5" src="output/llava_onevision_white.png" width="600" style="max-width: 100%;">
5+
<img alt="LLaVA-OneVision-1.5" src="output/llava_onevision_white.png" width="600" style="max-width: 100%;">
66
</picture>
77
</p>
88

@@ -74,7 +74,7 @@
7474
- [Models](#models)
7575
- [Datasets](#datasets)
7676
- [Results](#evaluation-results)
77-
- [Quick Start with HuggingFace](#quick-start-with-huggingface)
77+
- [Quick Start with Hugging Face](#quick-start-with-huggingface)
7878
- [Evaluation](#evaluation)
7979
- [Quick Start For Training](#quick-start-guide)
8080
- [Fully Reproducing Guide](#fully-reproducing-guide)
@@ -83,7 +83,7 @@
8383

8484

8585
## Introduction
86-
**LLaVA-OneVision1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
86+
**LLaVA-OneVision-1.5** introduces a novel family of **fully open-source** Large Multimodal Models (LMMs) that achieves **state-of-the-art performance** with substantially **lower cost** through training on **native resolution** images.
8787

8888
- **Superior Performance**
8989
A family of fully open-source large multimodal models demonstrating
@@ -96,8 +96,8 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
9696
- Comprehensive instruction fine-tuning data covering a wide range of tasks
9797

9898
- **Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
99-
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
100-
- Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
99+
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU hour)
100+
- Built on **Megatron-LM** with support for **MoE**, **FP8**, and **long sequence parallelization**
101101
- Optimized codebase for cost-effective scaling
102102

103103

@@ -112,23 +112,23 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
112112

113113
| Model | HF Link | Training Log |
114114
|--------------------------|--------------------------------------------------------------------------------------------------------|-------------|
115-
| LLaVA-OV-1.5-4B-Instruct | [🤗 HF / 4B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard) |
116-
| LLaVA-OV-1.5-8B-Instruct | [🤗 HF / 8B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
117-
| LLaVA-OV-1.5-4B-Base | [🤗 HF / 4B-Base](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Base) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard) |
118-
| LLaVA-OV-1.5-8B-Base | [🤗 HF / 8B-Base](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Base) | [📈 Tensorboard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
115+
| LLaVA-OV-1.5-4B-Instruct | [🤗 HF / 4B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct) | [📈 TensorBoard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard) |
116+
| LLaVA-OV-1.5-8B-Instruct | [🤗 HF / 8B-Instruct](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct) | [📈 TensorBoard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
117+
| LLaVA-OV-1.5-4B-Base | [🤗 HF / 4B-Base](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Base) | [📈 TensorBoard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-Instruct/tensorboard) |
118+
| LLaVA-OV-1.5-8B-Base | [🤗 HF / 8B-Base](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Base) | [📈 TensorBoard](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-8B-Instruct/tensorboard) |
119119
## Datasets
120120

121121
![Dataset Visualization](asset/dataset.jpg)
122122
<p align="left">
123123
<strong>(a)</strong> The vocabulary coverage proportion in the LLaVA-OneVision-1.5 Mid-Training dataset before and after concept balancing.
124124
<strong>(b)</strong> Distribution of data sources within the LLaVA-OneVision-1.5 Mid-Training dataset.
125-
<strong>(c)</strong> Distribution of data sources within the LLaVA-OneVision-1.5 Insturct dataset.
125+
<strong>(c)</strong> Distribution of data sources within the LLaVA-OneVision-1.5 Instruct dataset.
126126
</p>
127127

128128
| Description | Link | Status |
129129
|--------------------|--------------------------------------------------------------------------------------------------------|-------------|
130130
| LLaVA-OV-1.5-Mid-Training-85M | [🤗HF / Mid-Training 85M](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M) | Uploading… |
131-
| LLaVA-OV-1.5-Instruct | [🤗HF / Insturct-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Instruct-Data) | Uploading… |
131+
| LLaVA-OV-1.5-Instruct | [🤗HF / Instruct-Data](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Instruct-Data) | Uploading… |
132132

133133

134134
## Evaluation Results
@@ -233,8 +233,8 @@ docker run -it --gpus all \
233233

234234
You have two options to get started with LLaVA-OneVision-1.5-stage-0:
235235

236-
#### Option 1: Download pre-trained model from HuggingFace
237-
Download our `LLaVA-OneVision-1.5-4B-stage0` model directly from [HuggingFace](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-stage0).
236+
#### Option 1: Download pre-trained model from Hugging Face
237+
Download our `LLaVA-OneVision-1.5-4B-stage0` model directly from [Hugging Face](https://huggingface.co/lmms-lab/LLaVA-OneVision-1.5-4B-stage0).
238238

239239
#### Option 2: Merge initial weights yourself
240240
Alternatively, you can merge the initial weights from the original ViT and LLM:
@@ -246,7 +246,7 @@ python ds/merge_model.py \
246246
```
247247
Note: When merging weights, the adapter component will be initialized with default values.
248248

249-
Convert the model from HuggingFace format to Megatron format:
249+
Convert the model from Hugging Face format to Megatron format:
250250

251251
```bash
252252
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 bash examples/llava_ov_1_5/convert/convert_4b_hf_to_mcore.sh \
@@ -315,7 +315,7 @@ bash examples/llava_ov_1_5/quick_start/stage_2_instruct_llava_ov_4b.sh
315315
```
316316

317317

318-
### 6. Convert mcore to huggingface
318+
### 6. Convert mcore to Hugging Face
319319
```bash
320320
AIAK_TRAINING_PATH=/workspace/LLaVA-OneVision-1.5 \
321321
bash examples/llava_ov_1_5/convert/convert_4b_mcore_to_hf.sh \
@@ -345,14 +345,14 @@ CUDA_VISIBLE_DEVICES=4,5,6,7 accelerate launch \
345345
To improve model training efficiency, we implement offline sample packing:
346346

347347
1. Download the [**Mid-Training-85M Dataset**](https://huggingface.co/datasets/lmms-lab/LLaVA-One-Vision-1.5-Mid-Training-85M)
348-
2. Pack the data into webdataset format, refer to [**Examples offlinepacking**](examples_offline_packing) and [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
348+
2. Pack the data into WebDataset format, refer to [**Examples offlinepacking**](examples_offline_packing) and [**Offline Padding-Free Data Packing**](examples/llava_ov_1_5/sample_packing/README.md)
349349

350350

351351
### Instruct
352-
1. Download the [**LLaVA-OneVision-1.5-Insturct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Insturct-Data)
353-
2. Convert the data into webdataset format, refer to [**Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
352+
1. Download the [**LLaVA-OneVision-1.5-Instruct-Data**](https://huggingface.co/datasets/lmms-lab/LLaVA-OneVision-1.5-Instruct-Data)
353+
2. Convert the data into WebDataset format, refer to [**Conversion for Mixed Instruction Data**](docs/sft_data_preprocessing.md)
354354

355-
## Roadmaps
355+
## Roadmap
356356

357357
Q4 2025 Key Deliverables:
358358

@@ -445,7 +445,7 @@ If you find *LLaVA-OneVision-1.5* useful in your research, please consider to ci
445445
@inproceedings{LLaVA-OneVision-1.5,
446446
title={LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training},
447447
author={An, Xiang and Xie, Yin and Yang, Kaicheng and Zhang, Wenkang and Zhao, Xiuwei and Cheng, Zheng and Wang, Yirui and Xu, Songcen and Chen, Changrui and Wu, Chunsheng and Tan, Huajie and Li, Chunyuan and Yang, Jing and Yu, Jie and Wang, Xiyao and Qin, Bin and Wang, Yumeng and Yan, Zizhen and Feng, Ziyong and Liu, Ziwei and Li, Bo and Deng, Jiankang},
448-
booktitle={arxiv},
448+
booktitle={arXiv},
449449
year={2025}
450450
}
451451

0 commit comments

Comments
 (0)