Instructor: Rohit
Date: June 2, 2025 (☀️ Afternoon Session)
This session extended our foundational understanding of multiple-view geometry by addressing how cameras relate across different viewpoints — a crucial building block for structure-from-motion, visual odometry, and SLAM systems.
We began by revisiting the core mathematical tool of homographies, which capture planar transformations between views. This naturally led to a study of Zhang’s method for camera calibration using homographies, offering a practical route to estimate intrinsic parameters from multiple images of a known planar scene.
To handle noisy correspondences and outliers, we introduced RANSAC — a robust estimation framework critical for real-world matching tasks. With this in place, we explored local feature matching, comparing classical methods like SIFT with Lowe’s ratio test to modern deep learning-based descriptors such as SuperPoint, SuperGlue, and LoFTR. The transition from handcrafted to learned representations highlights a shift toward end-to-end optimized pipelines in vision.
We then studied Perspective-n-Point (PnP) problems and compared them with the Direct Linear Transform (DLT). While DLT applies to general projection problems using 2D–3D correspondences (with unknow intrinsics), PnP specifically estimates camera pose (given intrinsics) from known 3D landmarks and their image projections — a key subroutine in visual localization.
To further reinforce the geometry between views, we encourage self-study on (resources provided below):
- The Fundamental Matrix, its geometric meaning, and estimation via the 8-point algorithm.
- The Essential Matrix, and its decomposition to recover relative rotation and translation between cameras.
- Monocular Visual Odometry, showing how sequential pose estimation can be achieved using only a single moving camera.
Each of these topics builds upon the previous, forming a coherent pipeline from calibration to correspondence to motion — essential for any modern vision or robotics system.
-
Implement a full monocular visual odometry pipeline on a short KITTI sequence. The steps include:
- Estimating the Fundamental Matrix using the 8-point algorithm.
- Computing the Essential Matrix using the known intrinsic calibration matrix K.
- Recovering relative rotation and translation (R, t) from the Essential Matrix.
- Comparing the estimated trajectory with the ground truth using
evo_apeor other trajectory evaluation tools.
-
📘 Visual Features (From Classical to Deep Learning):
This notebook guides you through the evolution of keypoint-based feature extraction and matching:
- Classical methods including Shi-Tomasi and the Harris Corner Detector.
- Scale-invariant features using SIFT with Lowe’s ratio test.
- Deep learning-based matchers such as SuperPoint, SuperGlue, and LoFTR — highlighting modern, robust alternatives for correspondence estimation.
Courtesy: Kumaraditya
💬 Post any questions or progress updates in the #module-4-multiview-geometry Slack channel.
| Topic | Link |
|---|---|
| Lecture Slides - Homography, PnP, Triangulation | lec-13-mvg-homography-pnp-triangulation.pdf |
| Mobile Sensing & Robotics II - Cyrill Stachniss (Playlist) | |
| Zhang’s Method and P3P (Lectures 29–31) | |
| Fundamental and Essential Matrix Intro (Lec 32) | |
| Relative Orientation and F, E Properties (Lec 33) | |
| Epipolar Geometry Construction (Lec 35) | |
| Estimating F, E (8-Point & Nister’s 5-Point) (Lec 36–37) | |
| PnP: Perspective-n-Point Problem – Steven LaValle (NPTEL) |