Code for the paper Believing without Seeing: Quality Scores for Contextualizing Vision-Language Model Explanations (https://arxiv.org/abs/2509.25844)
- Init submodules:
git submodule update --init --recursive- Create and activate a virtual environment, then install deps:
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt- Configure environment variables:
OPENAI_API_KEY: Your OpenAI API key.VLMQS_DATASETS_DIR: Path to datasets dir (default:data/).VLMQS_OUTPUTS_DIR: Path to outputs dir (default:model_outputs/).VLMQS_COST_FILE: Path to track API costs (default:total_cost.txt).
Download subsets and materialize CSVs and images:
python3 load_dataset.py --dataset allThis creates data/AOKVQA/AOKVQA.csv and data/VizWiz/VizWiz.csv with local image paths.
Generate predictions and rationales. Example (all models, both datasets):
python3 vqa_infer.py --dataset all --model all --rewrite_fileOutputs will be saved under model_outputs/<DATASET>/<MODEL>.csv.
For a quick test run (20 samples):
python3 vqa_infer.py --dataset VizWiz --model qwen --test --rewrite_fileRun support/contrastiveness, visual fidelity, informativeness, and commonsense:
python3 rationale_quality_analysis.py --dataset all --quality allThis updates each model_outputs/<DATASET>/<MODEL>.csv with new columns.