Updated SmolVLM2 fine-tuning notebook link and code format (#2693)

sergiopaniego · web-flow · commit d75a34630cdd · 2025-02-20T18:02:10.000+01:00
diff --git a/smolvlm2.md b/smolvlm2.md
@@ -255,10 +255,9 @@ messages = [
     {
         "role": "user",
         "content": [
-{"type": "text", "text": "What are the differences between these two images?"},
+            {"type": "text", "text": "What are the differences between these two images?"},
             {"type": "image", "path": "image_1.png"},
-{"type": "image", "path": "image_2.png"}
-            
+            {"type": "image", "path": "image_2.png"} 
         ]
     },
 ]
@@ -344,7 +343,7 @@ If you integrate SmolVLM2 in your apps using MLX and Swift, we'd love to know ab
 ### Fine-tuning SmolVLM2
 
 You can fine-tune SmolVLM2 on videos using transformers 🤗
-We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on 2.2B variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).
+We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on cB variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/SmolVLM2_Video_FT.ipynb).
 
 
 ## Read More

Original file line number	Diff line number	Diff line change
`@@ -255,10 +255,9 @@ messages = [`
`255`	`255`	`{`
`256`	`256`	`"role": "user",`
`257`	`257`	`"content": [`
`258`		`-{"type": "text", "text": "What are the differences between these two images?"},`
	`258`	`+ {"type": "text", "text": "What are the differences between these two images?"},`
`259`	`259`	`{"type": "image", "path": "image_1.png"},`
`260`		`-{"type": "image", "path": "image_2.png"}`
`261`		`-`
	`260`	`+ {"type": "image", "path": "image_2.png"}`
`262`	`261`	`]`
`263`	`262`	`},`
`264`	`263`	`]`
`@@ -344,7 +343,7 @@ If you integrate SmolVLM2 in your apps using MLX and Swift, we'd love to know ab`
`344`	`343`	`### Fine-tuning SmolVLM2`
`345`	`344`
`346`	`345`	`You can fine-tune SmolVLM2 on videos using transformers 🤗`
`347`		`-We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on 2.2B variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/Smol_VLM_FT.ipynb).`
	`346`	`+We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on cB variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/SmolVLM2_Video_FT.ipynb).`
`348`	`347`
`349`	`348`
`350`	`349`	`## Read More`