You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: smolvlm2.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -298,7 +298,7 @@ python -m mlx_vlm.generate \
298
298
--prompt "Can you describe this image?"
299
299
```
300
300
301
-
We also created a simple script for video understanding. You can use it like follows:
301
+
We also created a simple script for video understanding. You can use it as follows:
302
302
303
303
```bash
304
304
python -m mlx_vlm.smolvlm_video_generate \
@@ -315,7 +315,7 @@ Note that the system prompt is important to bend the model to the desired behavi
315
315
316
316
The Swift language is also supported through the [mlx-swift-examples repo](https://github.com/ml-explore/mlx-swift-examples), which is what we used to build our iPhone app.
317
317
318
-
Until [our in-progress PR](https://github.com/ml-explore/mlx-swift-examples/pull/206) is finalized and merged, you have to compile the project [from this fork](https://github.com/cyrilzakka/mlx-swift-examples), and then you can use the `llm-tool` CLI on your Mac like follows.
318
+
Until [our in-progress PR](https://github.com/ml-explore/mlx-swift-examples/pull/206) is finalized and merged, you have to compile the project [from this fork](https://github.com/cyrilzakka/mlx-swift-examples), and then you can use the `llm-tool` CLI on your Mac as follows.
319
319
320
320
For image inference:
321
321
@@ -343,14 +343,14 @@ If you integrate SmolVLM2 in your apps using MLX and Swift, we'd love to know ab
343
343
### Fine-tuning SmolVLM2
344
344
345
345
You can fine-tune SmolVLM2 on videos using transformers 🤗
346
-
We have fine-tuned 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on cB variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/SmolVLM2_Video_FT.ipynb).
346
+
We have fine-tuned the 500M variant in Colab on video-caption pairs in [VideoFeedback](https://huggingface.co/datasets/TIGER-Lab/VideoFeedback) dataset for demonstration purposes. Since the 500M variant is small, it's better to apply full fine-tuning instead of QLoRA or LoRA, meanwhile you can try to apply QLoRA on cB variant. You can find the fine-tuning notebook [here](https://github.com/huggingface/smollm/blob/main/vision/finetuning/SmolVLM2_Video_FT.ipynb).
347
347
348
348
349
349
## Read More
350
350
351
351
We would like to thank Raushan Turganbay, Arthur Zucker and Pablo Montalvo Leroux for their contribution of the model to transformers.
352
352
353
-
We are looking forward to see all the things you'll build with SmolVLM2!
354
-
If you'd like to learn more about SmolVLM family of models, feel free to read the following:
353
+
We are looking forward to seeing all the things you'll build with SmolVLM2!
354
+
If you'd like to learn more about the SmolVLM family of models, feel free to read the following:
355
355
356
356
[SmolVLM2 - Collection with Models and Demos](https://huggingface.co/collections/HuggingFaceTB/smolvlm2-smallest-video-lm-ever-67ab6b5e84bf8aaa60cb17c7)
0 commit comments