In this notebook(https://huggingface.co/learn/cookbook/fine_tuning_vlm_trl), it uses messages column to finetune. However, when I try to reproduce it using Qwen2.5-VL and my own dataset, the model converges to a local minima. I then read from the dataset sturcture and saw that I should use prompt and completion and it works. I think we should make this clear in the notebook.
huggingface/trl#4077