We propose STRIDE-QA, a large-scale VQA dataset for ego-centric spatiotemporal reasoning in urban driving scenes, accompanied by a benchmark suite for evaluation.
[2025-08-22]
Code and dataset for the STRIDE-QA Bench is released. Please refer to the README for more details.[2025-08-19]
arXiv paper released. Dataset/Benchmark/Models are coming soon. Please stay tuned! ☕[2025-07-13]
Short paper is accepted to the ICCV End-to-End 3D Learning Workshop.
We provide STRIDE-QA Bench as an official framework for evaluating the spatiotemporal reasoning abilities of VLMs in urban driving contexts. The toolkit includes inference runners, evaluation scripts, and visualization utilities.
- Toolkit: benchmarks/STRIDE-QA-Bench
- Dataset: turing-motors/STRIDE-QA-Bench
See the README.md for installation, usage, and examples.
This project is released under the CC BY-NC-SA 4.0 License.
This project is based on results obtained from a project, JPNP20017, subsidized by the New Energy and Industrial Technology Development Organization (NEDO).
If you find STRIDE-QA is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@misc{ishihara2025strideqa,
title={STRIDE-QA: Visual Question Answering Dataset for Spatiotemporal Reasoning in Urban Driving Scenes},
author={Keishi Ishihara and Kento Sasaki and Tsubasa Takahashi and Daiki Shiono and Yu Yamaguchi},
year={2025},
eprint={2508.10427},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.10427},
}