Bowei Zhang1,2*, Lei Ke1*, Adam W. Harley3, Katerina Fragkiadaki1
1Carnegie Mellon University 2Peking University 3Stanford University
* Equal Contribution
TAPIP3D is a method for long-term feed-forward 3D point tracking in monocular RGB and RGB-D video sequences. It introduces a 3D feature cloud representation that lifts image features into a persistent world coordinate space, canceling out camera motion and enabling accurate trajectory estimation across frames.
You can install the package directly from source:
git clone https://github.com/tapip3d/tapip3d.git
cd tapip3d
pip install -e .For development, install with optional dependencies:
pip install -e ".[dev]"After installation, you can use TAPIP3D in your Python code:
import tapip3d
# Run inference on a video file
result_path = tapip3d.run_inference(
input_path="path/to/your/video.mp4",
checkpoint="path/to/checkpoint.pth",
output_dir="outputs/my_results",
device="cuda",
num_iters=6,
resolution_factor=2
)
print(f"Results saved to: {result_path}")
# Visualize the results
tapip3d.visualize(result_path, open_browser=True)The package also provides command-line tools:
# Run inference
tapip3d-inference path/to/video.mp4 --checkpoint path/to/checkpoint.pth --output_dir outputs
# Visualize results
tapip3d-visualize path/to/results.npz --port 8080The run_inference function accepts the following parameters:
input_path(str): Path to input video (.mp4, .avi, .mov, .webm) or npz fileoutput_dir(str, optional): Directory to save results (default: "outputs/inference")checkpoint(str, optional): Path to model checkpointdevice(str, optional): Device to run inference on (default: "cuda")num_iters(int, optional): Number of iterations for inference (default: 6)support_grid_size(int, optional): Grid size for support points (default: 16)num_threads(int, optional): Number of threads for parallel processing (default: 8)resolution_factor(int, optional): Resolution scaling factor (default: 2)vis_threshold(float, optional): Visibility threshold (default: 0.9)depth_model(str, optional): Depth model to use if depths are not provided (default: "moge")
The visualize function accepts the following parameters:
npz_file(str or Path): Path to the input .result.npz filewidth(int, optional): Target width for visualization (default: 256)height(int, optional): Target height for visualization (default: 192)fps(int, optional): Base frame rate for playback (default: 4)port(int, optional): Port to serve on (default: random available port)open_browser(bool, optional): Whether to automatically open browser (default: True)block(bool, optional): Whether to block until server is stopped (default: True)
- Video files:
.mp4,.avi,.mov,.webm - NPZ files with pre-computed depths and camera parameters
The function returns a Path object pointing to the saved results NPZ file containing:
video: Original video framesdepths: Depth mapsintrinsics: Camera intrinsic parametersextrinsics: Camera extrinsic parameterscoords: Tracked 3D coordinatesvisibs: Visibility informationquery_points: Query points used for tracking
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
If you use this code in your research, please cite:
@misc{tapip3d,
title={TAPIP3D: 3D Point Tracking and Inference},
author={TAPIP3D Team},
url={https://tapip3d.github.io/},
year={2024}
}