facebookresearch · ashimdahal · Jul 8, 2025 · Jul 8, 2025 · Oct 15, 2025 · Oct 21, 2025
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,8 @@
 .hydra/
 output/
 ckpt/
+onnx_exports/
+datasets/
 # Byte-compiled / optimized / DLL files
 __pycache__/
 **/__pycache__/

diff --git a/vggt/dependency/track_modules/__init__.py → INFERENCE_GUIDE.md b/vggt/dependency/track_modules/__init__.py → INFERENCE_GUIDE.md
diff --git a/README.md b/README.md
@@ -138,6 +138,56 @@ Furthermore, if certain pixels in the input frames are unwanted (e.g., reflectiv
 
 </details>
 
+## Point Cloud Reconstruction Pipeline
+
+The repository ships with a compact CLI that mirrors the `demo_colmap.py`
+workflow and produces two point clouds (VGGT metric depth + Depth Anything
+rectified depth):
+
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path path/to/image_folder \
+  --output-dir pcd_out/session01
+```
+
+### Highlights
+
+- **VGGT bootstrap**: loads `vggt_model.pt`, predicts cameras/depth, and
+  unprojects the metric depth maps to `vggt_point_cloud.ply`.
+- **Depth Anything rectification** *(optional)*: runs the ONNX/TensorRT export under
+  `onnx_exports/depth_anything/`, fits a scale+shift using VGGT depth, and
+  writes `depth_anything_rectified_point_cloud.ply`.
+- **Metadata**: every run stores per-frame intrinsics, extrinsics, and the
+  fitted scale/shift values in `metadata.json`.
+- **Backend choice**: `--depth-backend {auto|tensorrt|onnxruntime}` lets you
+  select TensorRT (fastest when available) or onnxruntime for Depth Anything.
+
+Example for a video sequence:
+
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type video \
+  --path path/to/cam_0.mp4,path/to/cam_1.mp4 \
+  --stride 2 \
+  --batch-size 8 \
+  --output-dir pcd_out/video_session
+```
+
+Pass `--depth-anything off` to skip the second stage, or override paths via
+`--vggt-weights` and `--depth-anything-engine`. The legacy
+`reconstruction/tools/pcd_inference.py` wrapper now simply forwards to this
+module (emitting a deprecation warning).
+
+Additional knobs and outputs:
+
+- `--depth-backend`: choose `auto` (default), `tensorrt`, or `onnxruntime`.
+- `--depth-workers`: set the number of Depth Anything workers (`0` = auto).
+- `--log-level`: surface per-stage timings by switching to `DEBUG` / `INFO`.
+- Results are written per frame: `frame_<idx>_vggt.ply` for the reference frame
+  and `frame_<idx>_depth_anything.ply` for subsequent frames, plus `metadata.json`
+  summarizing intrinsics/extrinsics, scale/shift, and runtime stats.
+
 
 ## Interactive Demo
 

diff --git a/agents.md b/agents.md
@@ -0,0 +1,60 @@
+Feel free to remove the content beyond this line when it’s no longer needed.
+
+# Project Snapshot for Follow-on Agents
+
+- **Mission**: near-real-time multi-view reconstruction. VGGT handles metric bootstrap; Depth Anything v2 supplies fast depth updates aligned via scale/shift + EKF smoothing.
+
+- **Key entry points**
+  - `reconstruction/simple_pipeline.py`
+    - Args: `--image-loops` repeats a static frame set to emulate a longer stream; `--no-save-ply` disables per-frame/aggregate PLY writes to isolate compute cost.
+    - Chunk logs now read `Chunk XXX | VGGT 6f → … | Depth …` so timings are per 6-frame batch (batch size = reconstruction latency unit).
+  - `reconstruction/pcd/depth_anything.py`
+    - TensorRT worker pool with `suggest_workers()` (auto picks pool size) and optional backend metadata.
+  - `onnx/tools/benchmark_trt_engines.py`
+    - Compares TensorRT, ONNX Runtime, and PyTorch baselines. Use `--norm zero_center` to match VGGT preprocessing.
+  - `onnx/tools/vggt_to_trt.py`
+    - Exports VGGT to sanitized ONNX + TensorRT engines (float64/bfloat16 tensors downcast). Keep using this for reliable ONNX inputs.
+
+- **Environment facts**
+  - Conda env `compvis`. CUDA available.
+  - Session shell (zsh) pre-exports CUDA/TensorRT paths. When reproducing runs, mirror:
+    ```bash
+    export CUDA_HOME=/usr/local/cuda-12.9
+    export PATH="$CUDA_HOME/bin:$PATH"
+    export LD_LIBRARY_PATH="$CUDA_HOME/lib64:${LD_LIBRARY_PATH:-}"
+    export PATH="/usr/src/tensorrt/bin:$PATH"
+    ```
+  - PyCUDA works when `LD_LIBRARY_PATH` includes the newer `libstdc++.so` (configured in `.zshrc`).
+  - Use `python -m …`; key modules (`simple_pipeline`, `depth_anything`) pass `py_compile`.
+
+- **Throughput testing recipe**
+  ```bash
+  python -m reconstruction.simple_pipeline \
+    --input-type images \
+    --path datasets/cam_snaps/demo \
+    --batch-size 6 \
+    --image-loops 50 \
+    --vggt-weights vggt_model.pt \
+    --depth-anything auto \
+    --depth-backend tensorrt \
+    --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+    --no-save-ply \
+    --output-dir out/fps_sweep \
+    --log-level INFO
+  ```
+  - Outputs per-chunk timings (VGGT + Depth) with no PLY overhead.
+
+- **Standalone engine benchmark**
+  ```bash
+  python onnx/tools/benchmark_trt_engines.py \
+    --images-dir datasets/cam_snaps/demo \
+    --num-views 6 \
+    --hf-weights vggt_model.pt \
+    --onnx-models onnx_exports/vggt_core_6.onnx \
+    --trt-engines onnx_exports/six_cameras_pcd/vggt-6x3x518x518-pcd_fp16.engine \
+    --norm zero_center
+  ```
+  - Reports raw inference FPS/latency and depth sanity checks.
+
+- **Current focus**
+  - Hitting ≥100 FPS per 6-frame batch end-to-end by trimming CPU post-processing (scale-fit + point-cloud unprojection). Use the throughput recipe above for measurements.
diff --git a/demo_colmap.py b/demo_colmap.py
@@ -21,6 +21,7 @@
 from pathlib import Path
 import trimesh
 import pycolmap
+import shutil
 
 
 from vggt.models.vggt import VGGT
@@ -42,8 +43,10 @@
 def parse_args():
     parser = argparse.ArgumentParser(description="VGGT Demo")
     parser.add_argument("--scene_dir", type=str, required=True, help="Directory containing the scene images")
+    parser.add_argument("--output_dir", type=str, required=True, help="Directory to save the reconstruction results")
     parser.add_argument("--seed", type=int, default=42, help="Random seed for reproducibility")
     parser.add_argument("--use_ba", action="store_true", default=False, help="Use BA for reconstruction")
+    parser.add_argument("--overwrite", action="store_true", help="Allow overwriting existing output directory.")
     ######### BA parameters #########
     parser.add_argument(
         "--max_reproj_error", type=float, default=8.0, help="Maximum reprojection error for reconstruction"
@@ -94,6 +97,21 @@ def demo_fn(args):
     # Print configuration
     print("Arguments:", vars(args))
 
+    # Check for existing output and handle overwriting
+    sparse_dir_exists = os.path.isdir(os.path.join(args.output_dir, "sparse"))
+    if sparse_dir_exists and not args.overwrite:
+        raise FileExistsError(
+            f"Output directory '{args.output_dir}' already contains a 'sparse' reconstruction. "
+            "Use the --overwrite flag to overwrite existing files."
+        )
+
+    # If overwriting, remove the old sparse directory to ensure a clean slate
+    if sparse_dir_exists and args.overwrite:
+        print(f"Overwriting existing reconstruction in {args.output_dir}")
+        shutil.rmtree(os.path.join(args.output_dir, "sparse"))
+
+    os.makedirs(args.output_dir, exist_ok=True)
+
     # Set seed for reproducibility
     np.random.seed(args.seed)
     torch.manual_seed(args.seed)
@@ -240,13 +258,34 @@ def demo_fn(args):
         shared_camera=shared_camera,
     )
 
-    print(f"Saving reconstruction to {args.scene_dir}/sparse")
-    sparse_reconstruction_dir = os.path.join(args.scene_dir, "sparse")
+    sparse_reconstruction_dir = os.path.join(args.output_dir, "sparse")
     os.makedirs(sparse_reconstruction_dir, exist_ok=True)
+
+    # Write COLMAP sparse model
+    print(f"Saving reconstruction to {sparse_reconstruction_dir}")
     reconstruction.write(sparse_reconstruction_dir)
 
     # Save point cloud for fast visualization
-    trimesh.PointCloud(points_3d, colors=points_rgb).export(os.path.join(args.scene_dir, "sparse/points.ply"))
+    # Extract points and colors directly from the final reconstruction for consistency
+    points3D = reconstruction.points3D
+    if points3D:
+        ply_path = os.path.join(sparse_reconstruction_dir, "points.ply")
+        points_for_ply = np.array([p.xyz for p in points3D.values()])
+        colors_for_ply = np.array([p.color for p in points3D.values()])
+        trimesh.PointCloud(points_for_ply, colors=colors_for_ply).export(ply_path)
+        print(f"Saved point cloud visualization to {ply_path}")
+
+    # Copy images to the output directory to create a self-contained project, if necessary
+    if os.path.abspath(args.scene_dir) != os.path.abspath(args.output_dir):
+        output_image_dir = os.path.join(args.output_dir, "images")
+        os.makedirs(output_image_dir, exist_ok=True)
+        print(f"Copying images to {output_image_dir}...")
+        for src_path, base_name in zip(image_path_list, base_image_path_list):
+            shutil.copy(src_path, os.path.join(output_image_dir, base_name))
+    else:
+        print("Output directory is the same as the scene directory. Skipping image copy.")
+
+    print("Reconstruction complete.")
 
     return True
 

diff --git a/docs/COMMAND_RECIPES.md b/docs/COMMAND_RECIPES.md
@@ -0,0 +1,190 @@
+# Reconstruction Command Recipes
+
+The table below collects the common command combinations so you can launch the pipeline in whichever mode you need without re-deriving the flag set each time. Every command assumes you are in the project root, have the CUDA/TensorRT paths exported (see `agents.md`), and are running inside the `compvis` conda env.
+
+Replace the paths/engine files as necessary for your dataset.
+
+---
+
+## Baseline (VGGT only, no depth rectification)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --image-loops 1 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything off \
+  --output-dir out/vggt_only \
+  --log-level INFO
+```
+
+## VGGT + Depth Anything (TensorRT) Throughput Sweep
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --image-loops 50 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --no-save-ply \
+  --output-dir out/fps_sweep \
+  --log-level INFO
+```
+
+## VGGT + Depth Anything (ONNX Runtime)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend onnxruntime \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.onnx \
+  --output-dir out/onnx_depth \
+  --log-level INFO
+```
+
+## Live Visualization (Open3D) – VGGT cloud
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything off \
+  --live-viz o3d \
+  --live-viz-source vggt \
+  --live-viz-max-points 200000 \
+  --output-dir out/viz_vggt \
+  --log-level INFO
+```
+
+## Live Visualization – Depth Anything rectified cloud
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --live-viz o3d \
+  --live-viz-source depth \
+  --no-save-ply \
+  --output-dir out/viz_depth \
+  --log-level INFO
+```
+
+## Gaussian Field Export (VGGT points only)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything off \
+  --gaussian-init vggt \
+  --gaussian-voxel-size 0.01 \
+  --gaussian-min-points 8 \
+  --output-dir out/gaussian_vggt \
+  --log-level INFO
+```
+
+## Gaussian Field Export (Depth rectified points only)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --gaussian-init depth \
+  --gaussian-voxel-size 0.01 \
+  --gaussian-min-points 8 \
+  --output-dir out/gaussian_depth \
+  --log-level INFO
+```
+
+## Gaussian Field Export (Blended VGGT + Depth)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --gaussian-init both \
+  --gaussian-voxel-size 0.01 \
+  --gaussian-min-points 12 \
+  --output-dir out/gaussian_both \
+  --log-level INFO
+```
+
+## Optimized Live View (Geometry Reuse) + Live Viz
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --image-loops 50 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --optimized-live-view \
+  --optimized-depth-threshold 0.02 \
+  --optimized-color-threshold 0.12 \
+  --optimized-confidence-threshold 0.25 \
+  --gaussian-init both \
+  --live-viz o3d \
+  --live-viz-source both \
+  --live-viz-max-points 150000 \
+  --no-save-ply \
+  --output-dir out/optimized_view \
+  --log-level INFO
+```
+
+## Optimized Live View (no viz, minimal output)
+```bash
+python -m reconstruction.simple_pipeline \
+  --input-type images \
+  --path datasets/cam_snaps/demo \
+  --batch-size 6 \
+  --image-loops 50 \
+  --vggt-weights vggt_model.pt \
+  --depth-anything auto \
+  --depth-backend tensorrt \
+  --depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
+  --optimized-live-view \
+  --optimized-depth-threshold 0.01 \
+  --optimized-color-threshold 0.08 \
+  --optimized-confidence-threshold 0.3 \
+  --gaussian-init none \
+  --no-save-ply \
+  --output-dir out/optimized_headless \
+  --log-level INFO
+```
+
+---
+
+### Additional Toggles
+
+- `--input-type video --path "cam1.mp4,cam2.mp4"`: load frames from comma-separated video list.
+- `--stride N`: sample every Nth frame.
+- `--max-frames K`: hard cap on processed frames.
+- `--device cuda:0` or `--device cpu`: override VGGT torch device.
+- `--depth-workers 4`: set explicit worker count for Depth Anything TensorRT pool.
+
+Combine any of the toggles with the recipes above to suit different datasets or hardware constraints.
+
+These commands mirror the latest pipeline features (optimized reuse, Gaussian export, live viz), so you can mix-and-match tomorrow without digging back into code.***