Visualize VGGT Inference Process via DPT Logit Lens

This web demo explores how VGGT's geometry emerges across layers using a "logit lens"-like style ablation. Inspired by the paper Understanding Multi-View Transformers (Stary and Gaubil et al., 2025).
This

visualize point clouds and camera poses encoded in the intermediate hidden layers (by applying DPT head with swapping its inputs with earlier layers and applyng camera head)
canonicalizes poses, aligns scales, and subsamples colored point clouds
visualizes each result in Gradio, and combined (interpolated) motions via Viser.

The point clouds and camera poses are refined step by step.

Setup

pip install -r requirements.txt
pip install git+https://github.com/facebookresearch/vggt.git

Run and open the page by web browsers.

python run_g2.py

Other Examples

Dec-02-2025.10-07-10-2.mov

Dec-02-2025.10-12-54-2.mov

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
example_images		example_images
vggt_lens		vggt_lens
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_g2.py		run_g2.py

Provide feedback