Skip to content

Commit d75a346

Browse files
Updated SmolVLM2 fine-tuning notebook link and code format (#2693)
1 parent 64c8a88 commit d75a346

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

smolvlm2.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -255,10 +255,9 @@ messages = [
255255
{
256256
"role": "user",
257257
"content": [
258-
{"type": "text", "text": "What are the differences between these two images?"},
258+
{"type": "text", "text": "What are the differences between these two images?"},
259259
{"type": "image", "path": "image_1.png"},
260-
{"type": "image", "path": "image_2.png"}
261-
260+
{"type": "image", "path": "image_2.png"}
262261
]
263262
},
264263
]
@@ -344,7 +343,7 @@ If you integrate SmolVLM2 in your apps using MLX and Swift, we'd love to know ab
344343
### Fine-tuning SmolVLM2
345344

346345
You can fine-tune SmolVLM2 on videos using transformers 🤗
347-
We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on 2.2B variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
346+
We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on cB variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/SmolVLM2_Video_FT.ipynb).
348347

349348

350349
## Read More

0 commit comments

Comments
 (0)