VectorSpaceLab · TITC · Jan 3, 2025
diff --git a/README.md b/README.md
@@ -34,8 +34,14 @@
 - [2024/10/15] 🔥 Video-XL is released,  including model, training and evaluation code.
 
 ## Model weights
-Please download our pre-trained and finetuned model weights from the [link](https://huggingface.co/sy1998/Video_XL/tree/main) 
-
+Please download our pre-trained and finetuned model weights from the [link](https://huggingface.co/sy1998/Video_XL/tree/main), and modify the model path in the demo code.
+```bash
+git lfs install
+git clone https://huggingface.co/sy1998/Video_XL
+git clone https://huggingface.co/openai/clip-vit-large-patch14-336
+```
+
+
 ## Installation 
 ```bash
 conda create -n videoxl python=3.10 -y && conda activate videoxl

diff --git a/demo.py b/demo.py
@@ -8,13 +8,14 @@
 # fix seed
 torch.manual_seed(0)
 
-
+# Please change the following paths to your own paths
 model_path = "/share/junjie/shuyan/VideoXL_weight_8"
+clip_path = "/share/junjie/shuyan/clip-vit-large-patch14-336"
 video_path="/share/junjie/shuyan/test_demo/ad2_watch_15min.mp4"
 
 max_frames_num =900 # you can change this to several thousands so long you GPU memory can handle it :)
 gen_kwargs = {"do_sample": True, "temperature": 1, "top_p": None, "num_beams": 1, "use_cache": True, "max_new_tokens": 1024}
-tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0")
+tokenizer, model, image_processor, _ = load_pretrained_model(model_path, None, "llava_qwen", device_map="cuda:0", mm_vision_tower=clip_path)
 
 model.config.beacon_ratio=[8]   # you can delete this line to realize random compression of {2,4,8} ratio