Skip to content

vbdi/Ego3D-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 

Repository files navigation

Spatial Reasoning with Vision-Language Models in Ego-Centric Multi-view Scenes (Code Comes Soon!)

Benchmarking and Improving 3D Spatial Reasoning in Vision-Language Models

arXiv Website HF Dataset: Ego3D-Bench

Sample

πŸ“Œ Key Highlights

  • πŸ“Š Ego3D-Bench: A benchmark of 8,600+ human-verified QA pairs for evaluating VLMs in ego-centric, multi-view outdoor environments.
  • 🧠 Ego3D-VLM: A post-training framework that builds cognitive maps from global 3D coordinates, achieving +12% QA accuracy and +56% distance estimation improvements.
  • πŸš€ Impact: Together, Ego3D-Bench and Ego3D-VLM move VLMs closer to human-level 3D spatial understanding in real-world settings.

βš–οΈ Ego3D-Bench

Benchmark Overview: We introduce Ego3D-Bench, a benchmark designed to evaluate the spatial understanding of VLMs in ego-centric multi-view scenarios. Images are collected from three different datasets: NuScenes, Argoverse, and Waymo. Questions are designed to require cross-view reseasoning. We define question from the ego-perspective and from the perspective of objects in the scene. To clearly indicate the perspective of each question, we categorize them into ego-centric or object-centric. In total we have 10 questions: 8 multi-choice QAs and 2 exact number QAs. Figure

Sample


🧠 Ego3D-VLM

Ego3D-VLM is a post-training framework that enhances 3D spatial reasoning of VLMs. Ego3D-VLM generates cognitive map based on estimated global 3D coordinates of object in the input prompt and create a textual cognitive map baed on the input images and the question. This approach results in 12% average improvement on multi-choice QA and 56% average improvement on absolute distance estimation. Sample


πŸ“Š Results

Sample


About

Spatial Reasoning with Vision-Language Models

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages