Shengli Zhou, Yang Liu, Feng Zheng📧
This repository is the official implementation of the ACM MM 2025 paper "Learn 3D VQA Better with Active Selection and Reannotation".
In our paper, we conduct comparative experiments (i.e., the "Lazy Oracle Experiment" and the "Diligent Oracle Experiment") and an ablation study to validate our methods. This repository contains the code for experiments.
For ScanQA, we modify the code from the official implementation of ScanQA. Please refer to the official repository for dependency installation and data preparation.
python scripts/train.py --use_color --tag <tag_name> --AL_mode <AL_strategy> [--AL_oracle]Options:
--AL_modesets the strategy used for active learning, which includes[random, entropy, infogain, variance].- Adding
--AL_oracleenables the usage of Hierarchical Reannotation Strategy; otherwise, the "lazy oracle" is applied. - For more training options, please run
python scripts/train.py -h.
-
Evaluation of trained ScanQA models with the val dataset:
python scripts/eval.py --folder <folder_name> --qa --force
<folder_name>corresponds to the folder underoutputs/with thetimestamp + <tag_name>. -
Scoring with the val dataset:
python scripts/score.py --folder <folder_name>
-
Prediction with the test dataset:
python scripts/predict.py --folder <folder_name> --test_type <test_type>
<test_type>includestest_w_objandtest_wo_obj.
For 3D-VisTA, we modify the code from the official implementation of 3D-VisTA. Please refer to the official repository for dependency installation and data preparation. Before running the model, path configurations in line 3 of ./dataset/path_config.py and line 5 of ./model/language/lang_encoder.py needs to be modified.
python3 run.py --config project/vista/train_scanqa_config.ymlOptions: in train_scanqa_config.yml,
AL_modesets the strategy used for active learning, which includes[random, variance].AL_oraclerepresents the usage of Hierarchical Reannotation Strategy.
python3 run.py --config project/vista/eval_scanqa_config.ymlWe would like to thank the authors of ScanQA and 3D-VisTA for their open-source release.
If you find this project useful in your research, please consider citing:
@inproceedings{10.1145/3746027.3755515,
author = {Zhou, Shengli and Liu, Yang and Zheng, Feng},
title = {Learn 3D VQA Better with Active Selection and Reannotation},
year = {2025},
isbn = {9798400720352},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3746027.3755515},
doi = {10.1145/3746027.3755515},
booktitle = {Proceedings of the 33rd ACM International Conference on Multimedia},
pages = {4610–4618},
numpages = {9},
keywords = {3d visual question-answering, active learning, online learning},
location = {Dublin, Ireland},
series = {MM '25}
}