Great job. Could you also provide the scripts for converting the training and test splits of the datasets shown in the image — VSI-Bench, MMSI, ERQA, RoboSpatial, EgoTaskQA, EgoTextVQA_indoor, Open-X VQA, QAEgo4D, and MindCube? This would help the community understand exactly how you used these datasets, since they are all evaluation benchmarks listed in Table 4 of the paper.