Hi there,
For https://huggingface.co/datasets/OpenGVLab/AS-V2/blob/main/as_pretrain_10m.json, which is "as_pretrain_10m.json: the filtered 10M samples in AS-1B, which are used in the pretraining phase of Stage 2."
What is your filtering strategy? Is there some shortcomings for AS-1B?
Thank you for the awesome work.