Hi, thank you very much for your impressive work and for open-sourcing this project.
I have a question regarding the experimental setup. Before training such a large-scale model on massive datasets, did you conduct any preliminary or pilot experiments to validate the design choices?
In addition, I was wondering whether you had used more strictly controlled or “fair” settings to compare with traditional VLM as baselines (e.g., similar data scale, training budget, or model capacity), in the early stages of development.
Thanks again for the great work, and I’d really appreciate any insights you’re willing to share.
Best regards