This repository contains an implementation of a monocular visual odometry (VO) pipeline for camera pose estimation and 3D landmark tracking. This pipeline was inspired by the project from a computer vision lecture at ETH and UZH: Vision Algorithms for Mobile Robotics, by Prof. Scaramuzza. It includes several advanced features to improve robustness and accuracy.
Figure: Visualization of the pipeline in action, showing tracked features, camera trajectory, and landmarks.
Visual Odometry (VO) is the process of estimating the egomotion of a camera by analyzing the changes that motion induces on images. This implementation follows a feature-based approach with the following components:
- Initialization: Bootstrap the system by establishing initial 3D landmarks and camera poses
- Continuous Operation:
- Track keypoints across frames
- Estimate camera pose using 2D-3D correspondences
- Triangulate new landmarks to maintain tracking
- Monocular visual odometry pipeline (no stereo information used)
- KLT feature tracking with forward-backward verification
- P3P RANSAC for robust pose estimation
- Dynamic landmark triangulation with parallax verification
- Visualization of trajectory and landmarks
-
Local Bundle Adjustment for combating scale drift
- Optimizes camera poses and 3D landmarks jointly
- Reduces accumulation of drift over time
- Implements a sliding window approach for computational efficiency
-
Keyframe-based Tracking for improved robustness
- Identifies keyframes based on feature tracking quality and parallax
- Uses keyframes as reference for triangulation and scale correction
- Reduces the risk of drift during quick rotations
-
Quantitative Feature Tracker Analysis
- Compares different feature tracking methods (KLT, SIFT, ORB)
- Analyzes tracking quality, computational efficiency, and robustness
- Generates comparative visualizations for evaluation
- Python 3.8+
- OpenCV 4.5+
- NumPy
- Matplotlib
- SciPy (for bundle adjustment and KD-tree)
# Clone the repository
git clone https://github.com/ben-du-pont/monocular-visual-odometry-pipeline.git
cd monocular-visual-odometry-pipeline
# Install dependencies
pip install numpy opencv-python matplotlib scipypython main.py --dataset [kitti|malaga|parking] --path /path/to/dataset--start N: Start processing from frame N (default: 0)--end N: Stop processing at frame N (default: -1, process all frames)--save: Save results to output directory--no_display: Run without visualization--feature_comparison: Run feature tracker comparison
# Run on KITTI dataset with visualization
python main.py --dataset kitti --path /path/to/kitti_dataset --save
# Run on Malaga dataset with feature comparison
python main.py --dataset malaga --path /path/to/malaga_dataset --save --feature_comparison- Select two frames with sufficient baseline
- Detect and track keypoints using KLT through intermediate frames
- Estimate fundamental matrix using RANSAC to filter outliers
- Calculate essential matrix from fundamental matrix and calibration
- Recover relative pose and triangulate initial 3D landmarks
- Track keypoints from previous to current frame using KLT
- Filter tracked keypoints using forward-backward verification
- Estimate current camera pose using P3P RANSAC
- Update existing landmarks and track candidate keypoints
- Triangulate new landmarks when sufficient parallax is achieved
- Apply bundle adjustment periodically to optimize poses and landmarks
The state S_i at each frame contains:
keypoints: 2D keypoints in the current frame (2xK)landmarks: Associated 3D landmarks (3xK)candidates: Candidate keypoints for future triangulation (2xM)first_obs: First observations of candidate keypoints (2xM)first_poses: Camera poses at first observations (16xM)
- Uses Lucas-Kanade optical flow (KLT) with forward-backward verification
- Parameters optimized for each dataset type
- Maintains a quality threshold to ensure reliable tracking
- Uses P3P algorithm with RANSAC for outlier rejection
- Filters correspondences based on reprojection error
- Maintains motion consistency using previous pose when estimation fails
- Triangulates new landmarks when parallax angle exceeds threshold
- Verifies depth and reprojection error to ensure quality
- Maintains persistent landmark IDs across frames for bundle adjustment
- Current frame with tracked features and candidates
- Recent trajectory (last 20 frames) with visible landmarks
- Feature count history
- Full trajectory overview
The pipeline has been tested on three datasets:
- KITTI dataset: Outdoor driving sequences with large translations
- Malaga dataset: Urban environment with various motion patterns
- Parking dataset: More complex motion with significant rotations
Performance metrics:
- Tracking success rate: 85-95% on most sequences
- Pose estimation accuracy: Local consistency maintained well
- Processing speed: about 5 frames per second (depending on parameters)
Several optimizations have been implemented to improve performance:
- KD-tree for efficient landmark association
- Selective keyframe processing for bundle adjustment
- Adaptive feature detection based on tracking quality
- Parallel processing for feature extraction and matching
- Poor initialization: Try different initial frames with more distinct motion
- Tracking failures: Adjust KLT parameters or reduce forward-backward threshold
- Drift in rotation: Increase keyframe frequency and bundle adjustment frequency
- Scale drift: Implement absolute scale recovery if ground truth is available, as by definition VO has scale ambiguity
The most important parameters to tune are:
forward_backward_threshold: Controls keypoint tracking quality (higher = more keypoints, potentially more noise)alpha_threshold: Minimum parallax angle for triangulation (lower = more landmarks, potentially less accurate)max_reprojection_error: Maximum allowed reprojection error (higher = more landmarks, potentially more outliers)
This project is licensed under the MIT License - see the LICENSE file for details.
- The project structure is based on the assignment from the University of Zurich's Robotics and Perception Group.
- Datasets from KITTI, Malaga, and the Parking sequences.
