Skip to content

Commit ae002d2

Browse files
authored
Merge pull request #1 from yeshaokai/shaokai/dev
Shaokai/dev
2 parents 8b5970f + 819d73a commit ae002d2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+3508
-967
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,4 +73,5 @@ data_processing/
7373

7474

7575
experiments/
76-
*.out
76+
*.out
77+
pretrained_models/

.vscode/launch.json

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"request": "launch",
88
"module": "torch.distributed.run",
99
"env": {
10-
"CUDA_VISIBLE_DEVICES": "1,2",
10+
"CUDA_VISIBLE_DEVICES": "1,2,3",
1111
"OMP_NUM_THREADS": "8",
1212
"NCCL_IB_DISABLE": "0",
1313
"NCCL_IB_GID_INDEX": "3",
@@ -18,7 +18,7 @@
1818
"WANDB_API_KEY": "65aeda82a75f1eed29c8e9250b175fcc73dca0d7",
1919
},
2020
"args": [
21-
"--nproc_per_node=2",
21+
"--nproc_per_node=3",
2222
"--nnodes=1",
2323
"--node_rank=0",
2424
"--master_addr=127.0.0.1",
@@ -31,6 +31,7 @@
3131
// "--image_folder", "/mediaPFM/data/haozhe/onevision/llava_data",
3232
"--image_folder", "/mediaPFM/data/haozhe/onevision/llava_data/geo3k/",
3333
"--video_folder", "/mediaPFM/data/haozhe/onevision/llava_video",
34+
// "--video_folder", "/home/haozhe/kitchen/AVION/datasets",
3435
"--mm_tunable_parts", "mm_vision_tower,mm_mlp_adapter,mm_language_model",
3536
"--mm_vision_tower_lr", "2e-6",
3637
"--vision_tower", "google/siglip-so400m-patch14-384",
@@ -89,13 +90,12 @@
8990
// "request": "launch",
9091
// "program": "docs/LLaVA_OneVision_Tutorials.py",
9192
// "console": "integratedTerminal",
92-
// "env":{"CUDA_VISIBLE_DEVICES":"0",
93-
// "LD_PRELOAD": "/usr/lib/x86_64-linux-gnu/libffi.so.7"},
93+
// "env":{
94+
// "CUDA_VISIBLE_DEVICES":"0",
95+
// // "HF_HOME": "/mnt/SV_storage/VFM/huggingface",
96+
// // "LD_PRELOAD": "/usr/lib/x86_64-linux-gnu/libffi.so.7"
97+
// },
9498
// "justMyCode": false,
95-
// // "args": [
96-
// // "--run_dir_name", "test",
97-
// // // "--use_big_decoder"
98-
// // ]
9999
// }
100100
// ]
101101
// }

README.md

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,21 +3,36 @@
33
</p>
44

55
# LLaVA-NeXT: Open Large Multimodal Models
6+
[![Static Badge](https://img.shields.io/badge/llava_video-paper-green)](http://arxiv.org/abs/2410.02713)
67
[![Static Badge](https://img.shields.io/badge/llava_onevision-paper-green)](https://arxiv.org/abs/2408.03326)
78
[![llava_next-blog](https://img.shields.io/badge/llava_next-blog-green)](https://llava-vl.github.io/blog/)
89

910
[![llava_onevision-demo](https://img.shields.io/badge/llava_onevision-demo-red)](https://llava-onevision.lmms-lab.com/)
11+
[![llava_next-video_demo](https://img.shields.io/badge/llava_video-demo-red)](https://huggingface.co/spaces/WildVision/vision-arena)
1012
[![llava_next-interleave_demo](https://img.shields.io/badge/llava_next-interleave_demo-red)](https://huggingface.co/spaces/lmms-lab/LLaVA-NeXT-Interleave-Demo)
11-
[![llava_next-video_demo](https://img.shields.io/badge/llava_next-video_demo-red)](https://huggingface.co/spaces/WildVision/vision-arena)
1213
[![Openbayes Demo](https://img.shields.io/static/v1?label=Demo&message=OpenBayes%E8%B4%9D%E5%BC%8F%E8%AE%A1%E7%AE%97&color=green)](https://openbayes.com/console/public/tutorials/gW0ng9jKXfO)
1314

15+
[![llava_video-checkpoints](https://img.shields.io/badge/llava_video-checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-next-video-661e86f5e8dabc3ff793c944)
1416
[![llava_onevision-checkpoints](https://img.shields.io/badge/llava_onevision-checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-onevision-66a259c3526e15166d6bba37)
1517
[![llava_next-interleave_checkpoints](https://img.shields.io/badge/llava_next-interleave_checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-next-interleave-66763c55c411b340b35873d1)
16-
[![llava_next-video_checkpoints](https://img.shields.io/badge/llava_next-video_checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-next-video-661e86f5e8dabc3ff793c944)
1718
[![llava_next-image_checkpoints](https://img.shields.io/badge/llava_next-image_checkpoints-blue)](https://huggingface.co/lmms-lab)
1819

1920
## Release Notes
2021

22+
- **[2024/10/04] 🔥 LLaVA-Video** (formerly LLaVA-NeXT-Video) has undergone a major upgrade! We are excited to release **LLaVA-Video-178K**, a high-quality synthetic dataset for video instruction tuning. This dataset includes:
23+
24+
- 178,510 caption entries
25+
- 960,792 open-ended Q&A pairs
26+
- 196,198 multiple-choice Q&A items
27+
28+
Along with this, we’re also releasing the **LLaVA-Video 7B/72B models**, which deliver competitive performance on the latest video benchmarks, including [Video-MME](https://video-mme.github.io/home_page.html#leaderboard), [LongVideoBench](https://longvideobench.github.io/), and [Dream-1K](https://tarsier-vlm.github.io/).
29+
30+
📄 **Explore more**:
31+
- [LLaVA-Video-178K Dataset](https://huggingface.co/datasets/lmms-lab/LLaVA-Video-178K): Download the dataset.
32+
- [LLaVA-Video Models](https://huggingface.co/collections/lmms-lab/llava-video-661e86f5e8dabc3ff793c944): Access model checkpoints.
33+
- [Paper](http://arxiv.org/abs/2410.02713): Detailed information about LLaVA-Video.
34+
- [LLaVA-Video Documentation](https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/docs/LLaVA_Video_1003.md): Guidance on training, inference and evaluation.
35+
2136
- [2024/09/13] 🔥 **🚀 [LLaVA-OneVision-Chat](docs/LLaVA_OneVision_Chat.md)**. The new LLaVA-OV-Chat (7B/72B) significantly improves the chat experience of LLaVA-OV. 📄
2237

2338
![](docs/ov_chat_images/chat_results.png)

0 commit comments

Comments
 (0)