You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+1-2Lines changed: 1 addition & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,13 +86,12 @@ A family of fully open-source large multimodal models demonstrating
86
86
- outperforming **Qwen2.5-VL** in most evaluation tasks.
87
87
88
88
-**High-Quality Data at Scale**
89
-
Meticulously curated **pre-training and SFT data** with rigorous filtering and quality control, achieving **superior data efficiency** with only **64B tokens**.
89
+
Meticulously curated **pre-training and SFT data** with rigorous filtering and quality control.
90
90
- Concept-balanced, highly diverse, high-quality caption data
91
91
- Comprehensive instruction fine-tuning data covering a wide range of tasks
92
92
93
93
-**Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
94
94
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
95
-
- 45% HFU efficiency in 8k context length
96
95
- Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
0 commit comments