About the performance on ScreenSpot

Hi, thanks for your great work!
[stage2]:
According to the scripts in this repo, we train the stage2 and evaluate it on Screenspot but only get 45.6%, which is also lower than the 77.4% in the paper.
Besides, when we try the provided Aguvis-7B-720P on Screenspot, it get 79.4%, which is also lower than the 84.4% in the paper.
I wonder if the author can provide the evaluation prompt for Screenspot. 
[stage1]:
Another question is that have you ever tried to evaluate the middle checkpoints like 1000-step or 2000-step on ScreenSpot during stage1 sft? According to our experiments, it seems when training 1000-step, the model lead to a very bad performance (e.g. <30%) on ScreenSpot.
 
many thx!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the performance on ScreenSpot #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

About the performance on ScreenSpot #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions