This web demo explores how VGGT's geometry emerges across layers using a "logit lens"-like style ablation. Inspired by the paper Understanding Multi-View Transformers (Stary and Gaubil et al., 2025).
This
- visualize point clouds and camera poses encoded in the intermediate hidden layers (by applying DPT head with swapping its inputs with earlier layers and applyng camera head)
- canonicalizes poses, aligns scales, and subsamples colored point clouds
- visualizes each result in Gradio, and combined (interpolated) motions via Viser.
The point clouds and camera poses are refined step by step.
Setup
pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/vggt.gitRun and open the page by web browsers.
python run_g2.py
Other Examples

