Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
.hydra/
output/
ckpt/
onnx_exports/
datasets/
# Byte-compiled / optimized / DLL files
__pycache__/
**/__pycache__/
Expand Down
File renamed without changes.
50 changes: 50 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,56 @@ Furthermore, if certain pixels in the input frames are unwanted (e.g., reflectiv

</details>

## Point Cloud Reconstruction Pipeline

The repository ships with a compact CLI that mirrors the `demo_colmap.py`
workflow and produces two point clouds (VGGT metric depth + Depth Anything
rectified depth):

```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path path/to/image_folder \
--output-dir pcd_out/session01
```

### Highlights

- **VGGT bootstrap**: loads `vggt_model.pt`, predicts cameras/depth, and
unprojects the metric depth maps to `vggt_point_cloud.ply`.
- **Depth Anything rectification** *(optional)*: runs the ONNX/TensorRT export under
`onnx_exports/depth_anything/`, fits a scale+shift using VGGT depth, and
writes `depth_anything_rectified_point_cloud.ply`.
- **Metadata**: every run stores per-frame intrinsics, extrinsics, and the
fitted scale/shift values in `metadata.json`.
- **Backend choice**: `--depth-backend {auto|tensorrt|onnxruntime}` lets you
select TensorRT (fastest when available) or onnxruntime for Depth Anything.

Example for a video sequence:

```bash
python -m reconstruction.simple_pipeline \
--input-type video \
--path path/to/cam_0.mp4,path/to/cam_1.mp4 \
--stride 2 \
--batch-size 8 \
--output-dir pcd_out/video_session
```

Pass `--depth-anything off` to skip the second stage, or override paths via
`--vggt-weights` and `--depth-anything-engine`. The legacy
`reconstruction/tools/pcd_inference.py` wrapper now simply forwards to this
module (emitting a deprecation warning).

Additional knobs and outputs:

- `--depth-backend`: choose `auto` (default), `tensorrt`, or `onnxruntime`.
- `--depth-workers`: set the number of Depth Anything workers (`0` = auto).
- `--log-level`: surface per-stage timings by switching to `DEBUG` / `INFO`.
- Results are written per frame: `frame_<idx>_vggt.ply` for the reference frame
and `frame_<idx>_depth_anything.ply` for subsequent frames, plus `metadata.json`
summarizing intrinsics/extrinsics, scale/shift, and runtime stats.


## Interactive Demo

Expand Down
60 changes: 60 additions & 0 deletions agents.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
Feel free to remove the content beyond this line when it’s no longer needed.

# Project Snapshot for Follow-on Agents

- **Mission**: near-real-time multi-view reconstruction. VGGT handles metric bootstrap; Depth Anything v2 supplies fast depth updates aligned via scale/shift + EKF smoothing.

- **Key entry points**
- `reconstruction/simple_pipeline.py`
- Args: `--image-loops` repeats a static frame set to emulate a longer stream; `--no-save-ply` disables per-frame/aggregate PLY writes to isolate compute cost.
- Chunk logs now read `Chunk XXX | VGGT 6f → … | Depth …` so timings are per 6-frame batch (batch size = reconstruction latency unit).
- `reconstruction/pcd/depth_anything.py`
- TensorRT worker pool with `suggest_workers()` (auto picks pool size) and optional backend metadata.
- `onnx/tools/benchmark_trt_engines.py`
- Compares TensorRT, ONNX Runtime, and PyTorch baselines. Use `--norm zero_center` to match VGGT preprocessing.
- `onnx/tools/vggt_to_trt.py`
- Exports VGGT to sanitized ONNX + TensorRT engines (float64/bfloat16 tensors downcast). Keep using this for reliable ONNX inputs.

- **Environment facts**
- Conda env `compvis`. CUDA available.
- Session shell (zsh) pre-exports CUDA/TensorRT paths. When reproducing runs, mirror:
```bash
export CUDA_HOME=/usr/local/cuda-12.9
export PATH="$CUDA_HOME/bin:$PATH"
export LD_LIBRARY_PATH="$CUDA_HOME/lib64:${LD_LIBRARY_PATH:-}"
export PATH="/usr/src/tensorrt/bin:$PATH"
```
- PyCUDA works when `LD_LIBRARY_PATH` includes the newer `libstdc++.so` (configured in `.zshrc`).
- Use `python -m …`; key modules (`simple_pipeline`, `depth_anything`) pass `py_compile`.

- **Throughput testing recipe**
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--image-loops 50 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--no-save-ply \
--output-dir out/fps_sweep \
--log-level INFO
```
- Outputs per-chunk timings (VGGT + Depth) with no PLY overhead.

- **Standalone engine benchmark**
```bash
python onnx/tools/benchmark_trt_engines.py \
--images-dir datasets/cam_snaps/demo \
--num-views 6 \
--hf-weights vggt_model.pt \
--onnx-models onnx_exports/vggt_core_6.onnx \
--trt-engines onnx_exports/six_cameras_pcd/vggt-6x3x518x518-pcd_fp16.engine \
--norm zero_center
```
- Reports raw inference FPS/latency and depth sanity checks.

- **Current focus**
- Hitting ≥100 FPS per 6-frame batch end-to-end by trimming CPU post-processing (scale-fit + point-cloud unprojection). Use the throughput recipe above for measurements.
45 changes: 42 additions & 3 deletions demo_colmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
from pathlib import Path
import trimesh
import pycolmap
import shutil


from vggt.models.vggt import VGGT
Expand All @@ -42,8 +43,10 @@
def parse_args():
parser = argparse.ArgumentParser(description="VGGT Demo")
parser.add_argument("--scene_dir", type=str, required=True, help="Directory containing the scene images")
parser.add_argument("--output_dir", type=str, required=True, help="Directory to save the reconstruction results")
parser.add_argument("--seed", type=int, default=42, help="Random seed for reproducibility")
parser.add_argument("--use_ba", action="store_true", default=False, help="Use BA for reconstruction")
parser.add_argument("--overwrite", action="store_true", help="Allow overwriting existing output directory.")
######### BA parameters #########
parser.add_argument(
"--max_reproj_error", type=float, default=8.0, help="Maximum reprojection error for reconstruction"
Expand Down Expand Up @@ -94,6 +97,21 @@ def demo_fn(args):
# Print configuration
print("Arguments:", vars(args))

# Check for existing output and handle overwriting
sparse_dir_exists = os.path.isdir(os.path.join(args.output_dir, "sparse"))
if sparse_dir_exists and not args.overwrite:
raise FileExistsError(
f"Output directory '{args.output_dir}' already contains a 'sparse' reconstruction. "
"Use the --overwrite flag to overwrite existing files."
)

# If overwriting, remove the old sparse directory to ensure a clean slate
if sparse_dir_exists and args.overwrite:
print(f"Overwriting existing reconstruction in {args.output_dir}")
shutil.rmtree(os.path.join(args.output_dir, "sparse"))

os.makedirs(args.output_dir, exist_ok=True)

# Set seed for reproducibility
np.random.seed(args.seed)
torch.manual_seed(args.seed)
Expand Down Expand Up @@ -240,13 +258,34 @@ def demo_fn(args):
shared_camera=shared_camera,
)

print(f"Saving reconstruction to {args.scene_dir}/sparse")
sparse_reconstruction_dir = os.path.join(args.scene_dir, "sparse")
sparse_reconstruction_dir = os.path.join(args.output_dir, "sparse")
os.makedirs(sparse_reconstruction_dir, exist_ok=True)

# Write COLMAP sparse model
print(f"Saving reconstruction to {sparse_reconstruction_dir}")
reconstruction.write(sparse_reconstruction_dir)

# Save point cloud for fast visualization
trimesh.PointCloud(points_3d, colors=points_rgb).export(os.path.join(args.scene_dir, "sparse/points.ply"))
# Extract points and colors directly from the final reconstruction for consistency
points3D = reconstruction.points3D
if points3D:
ply_path = os.path.join(sparse_reconstruction_dir, "points.ply")
points_for_ply = np.array([p.xyz for p in points3D.values()])
colors_for_ply = np.array([p.color for p in points3D.values()])
trimesh.PointCloud(points_for_ply, colors=colors_for_ply).export(ply_path)
print(f"Saved point cloud visualization to {ply_path}")

# Copy images to the output directory to create a self-contained project, if necessary
if os.path.abspath(args.scene_dir) != os.path.abspath(args.output_dir):
output_image_dir = os.path.join(args.output_dir, "images")
os.makedirs(output_image_dir, exist_ok=True)
print(f"Copying images to {output_image_dir}...")
for src_path, base_name in zip(image_path_list, base_image_path_list):
shutil.copy(src_path, os.path.join(output_image_dir, base_name))
else:
print("Output directory is the same as the scene directory. Skipping image copy.")

print("Reconstruction complete.")

return True

Expand Down
190 changes: 190 additions & 0 deletions docs/COMMAND_RECIPES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Reconstruction Command Recipes

The table below collects the common command combinations so you can launch the pipeline in whichever mode you need without re-deriving the flag set each time. Every command assumes you are in the project root, have the CUDA/TensorRT paths exported (see `agents.md`), and are running inside the `compvis` conda env.

Replace the paths/engine files as necessary for your dataset.

---

## Baseline (VGGT only, no depth rectification)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--image-loops 1 \
--vggt-weights vggt_model.pt \
--depth-anything off \
--output-dir out/vggt_only \
--log-level INFO
```

## VGGT + Depth Anything (TensorRT) Throughput Sweep
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--image-loops 50 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--no-save-ply \
--output-dir out/fps_sweep \
--log-level INFO
```

## VGGT + Depth Anything (ONNX Runtime)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend onnxruntime \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.onnx \
--output-dir out/onnx_depth \
--log-level INFO
```

## Live Visualization (Open3D) – VGGT cloud
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything off \
--live-viz o3d \
--live-viz-source vggt \
--live-viz-max-points 200000 \
--output-dir out/viz_vggt \
--log-level INFO
```

## Live Visualization – Depth Anything rectified cloud
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--live-viz o3d \
--live-viz-source depth \
--no-save-ply \
--output-dir out/viz_depth \
--log-level INFO
```

## Gaussian Field Export (VGGT points only)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything off \
--gaussian-init vggt \
--gaussian-voxel-size 0.01 \
--gaussian-min-points 8 \
--output-dir out/gaussian_vggt \
--log-level INFO
```

## Gaussian Field Export (Depth rectified points only)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--gaussian-init depth \
--gaussian-voxel-size 0.01 \
--gaussian-min-points 8 \
--output-dir out/gaussian_depth \
--log-level INFO
```

## Gaussian Field Export (Blended VGGT + Depth)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--gaussian-init both \
--gaussian-voxel-size 0.01 \
--gaussian-min-points 12 \
--output-dir out/gaussian_both \
--log-level INFO
```

## Optimized Live View (Geometry Reuse) + Live Viz
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--image-loops 50 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--optimized-live-view \
--optimized-depth-threshold 0.02 \
--optimized-color-threshold 0.12 \
--optimized-confidence-threshold 0.25 \
--gaussian-init both \
--live-viz o3d \
--live-viz-source both \
--live-viz-max-points 150000 \
--no-save-ply \
--output-dir out/optimized_view \
--log-level INFO
```

## Optimized Live View (no viz, minimal output)
```bash
python -m reconstruction.simple_pipeline \
--input-type images \
--path datasets/cam_snaps/demo \
--batch-size 6 \
--image-loops 50 \
--vggt-weights vggt_model.pt \
--depth-anything auto \
--depth-backend tensorrt \
--depth-anything-engine onnx_exports/depth_anything/depth_fp16.engine \
--optimized-live-view \
--optimized-depth-threshold 0.01 \
--optimized-color-threshold 0.08 \
--optimized-confidence-threshold 0.3 \
--gaussian-init none \
--no-save-ply \
--output-dir out/optimized_headless \
--log-level INFO
```

---

### Additional Toggles

- `--input-type video --path "cam1.mp4,cam2.mp4"`: load frames from comma-separated video list.
- `--stride N`: sample every Nth frame.
- `--max-frames K`: hard cap on processed frames.
- `--device cuda:0` or `--device cpu`: override VGGT torch device.
- `--depth-workers 4`: set explicit worker count for Depth Anything TensorRT pool.

Combine any of the toggles with the recipes above to suit different datasets or hardware constraints.

These commands mirror the latest pipeline features (optimized reuse, Gaussian export, live viz), so you can mix-and-match tomorrow without digging back into code.***
Loading