Prompt for Screenspot and v2

Thank you for this awesome work! however, I cannot reproduce results of Qwen2.5-VL 3b/7b baseline results on ScreenSpot and ScreenSpot-v2 as reported in the paper. I have used the prompt at [here](https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/computer_use.ipynb). Can you share the prompt used for evaluation on these benchmarks? Thank you!