Thank you for the excellent work on ASMv2.
In the paper, you mention that when creating the AS-V2 dataset, the bounding boxes of objects are used as part of the prompt for GPT-4V. However, the process of obtaining these bounding boxes wasn't explained.
Could you describe the workflow for acquiring the bounding boxes?