Can't reproduce the OSWorld-G results

Hi thank you for releasing an amazing model!
I tried the model and it works really well.
While I was able to reproduce the results for ScreenSpot-v2, the results for OSWorld-G was only 30.1%.
I was wondering if there's something I did wrong or if something has changed in the code.
Thank you so much!


The command I tried:

In evaluation folder,
```
python qwen25_vllm_osworld_g_jedi.py --annotation_path ../benchmark/OSWorld-G.json --image_dir ../benchmark/images/ --model_path xlangai/Jedi-7B-1080p
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't reproduce the OSWorld-G results #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can't reproduce the OSWorld-G results #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions