Please see this commit for a rough implementation - https://github.com/Haven-hvn/SmolVLM-Finetune/commit/f5a96b3d5b164f07c57a61f9cd0fe56171a0474b