SceneVGGT: VGGT-based online 3D semantic SLAM for indoor scene understanding and navigation

Anna Gelencsér-Horváth^*^† · Gergely Dinya^* · Péter Halász · Dorka Erős · Islam Muhammad Muqsit · Kristóf Karacs

^* Equal contribution. ^† Corresponding author.

SceneVGGT is a spatio-temporal 3D scene understanding framework that combines SLAM with semantic mapping for autonomous and assistive navigation. It supports online, real-time processing of streamed data (e.g., from an iPhone Pro). The pipeline’s GPU memory usage remains under 17 GB, irrespectively of the length of the input sequence and achieves competitive point-cloud performance on the ScanNet++ benchmark. Overall, SceneVGGT ensures robust semantic identification and is fast enough to support interactive assistive navigation with audio feedback.

News

[2026/4/30] Paper accepted for the IEEE ICIP 2026 conference.
[2026/2/13] Paper released on arXiv.
[2026/2/12] Code release.

Overview

SceneVGGT enables temporally coherent 3D semantic mapping by lifting 2D instance masks into 3D and tracking instances with the VGGT tracking head. Persistent object identities + timestamps provide computationally efficient, temporally consistent change detection, while floor-plane projection of object locations supports downstream assistive navigation—including a proof-of-concept navigation module.

3D semantic SLAM and navigation from Streaming Inputs

Installation

Clone SceneVGGT

git clone git@github.com:HBVC-AI/SceneVGGT.git
cd SceneGGT

Create conda environment

conda create -n scenevggt python=3.10
conda activate SceneVGGT

Install requirements

pip install -r requirements.txt

Download Checkpoints

Please download VGG-T model from here.

Evaluation codes

Coming soon.

Citation

If you find this project helpful, please consider citing the following paper:

@misc{scenevggt,
      title={SceneVGGT: VGGT-based online 3D semantic SLAM for indoor scene understanding and navigation}, 
      author={Anna Gelencsér-Horváth and Gergely Dinya and Dorka Boglárka Erős and Péter Halász and Islam Muhammad Muqsit and Kristóf Karacs},
      year={2026},
      eprint={2602.15899},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2602.15899}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
modules		modules
vggt		vggt
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
google6839416e49f46a08.html		google6839416e49f46a08.html
index.html		index.html
navigation_exp.ipynb		navigation_exp.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SceneVGGT: VGGT-based online 3D semantic SLAM for indoor scene understanding and navigation

News

Overview

3D semantic SLAM and navigation from Streaming Inputs

Installation

Download Checkpoints

Evaluation codes

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SceneVGGT: VGGT-based online 3D semantic SLAM for indoor scene understanding and navigation

News

Overview

3D semantic SLAM and navigation from Streaming Inputs

Installation

Download Checkpoints

Evaluation codes

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages