For AudioCaps, ClothoV2, CoCoCap, MSRVTTQA, MSVDQA, NextQA, OpenEQA, PointCap, ShapeNet, you need to manually download those source data (they are not directly supported by HuggingFace). Refer to the benchmark/ folder for file structure of those benchmarks.