-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Hi, thanks for your great work!
[stage2]:
According to the scripts in this repo, we train the stage2 and evaluate it on Screenspot but only get 45.6%, which is also lower than the 77.4% in the paper.
Besides, when we try the provided Aguvis-7B-720P on Screenspot, it get 79.4%, which is also lower than the 84.4% in the paper.
I wonder if the author can provide the evaluation prompt for Screenspot.
[stage1]:
Another question is that have you ever tried to evaluate the middle checkpoints like 1000-step or 2000-step on ScreenSpot during stage1 sft? According to our experiments, it seems when training 1000-step, the model lead to a very bad performance (e.g. <30%) on ScreenSpot.
many thx!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels