- Understanding the difference between 2D (images as arrays of pixels) and 3D (geometry with depth)
- Linear array of 3D coordinates (x, y, z) with optional attributes like color and normals
- Sample the surface and are linear in storage complexity
- Advantages: simple structure, direct sensor output
- Disadvantages: no connectivity information, sparse
- Cubic storage, storing values everywhere in 3D space, not just on surfaces
- Advantages: regular structure, easy to implement algorithms
- Disadvantages: cubic storage complexity, limited resolution
- Piecewise linear surfaces composed of vertices and faces
- They can represent the same geometry with different topologies
- Common file formats: OBJ, STL, PLY, OFF
- Advantages: efficient for rendering, widely supported
- Disadvantages: can represent invalid surfaces
- Higher-order surfaces used in CAD modeling and animation
- Offering non-linear representation with continuous normals
- Often using cubic degrees for animation
- Advantages: smooth representation, compact
- Disadvantages: cannot be directly rendered on GPU
- Defined by a function that represents distance to the surface
- Storing a floating-point value per voxel indicating the proximity to the surface
- Advantages: easy to combine multiple surfaces, robust topology
- Disadvantages: harder to render directly
- Surfaces defined by an explicit function, such as height fields in game design
- Advantages: simple to work with for specific applications
- Disadvantages: limited in representable geometries
- Images where each pixel stores depth information
- Typically represents distance along the z-axis, not raw ray length
- Type: Passive
- Advantages: Works indoors and outdoors
- Disadvantages: Computationally expensive due to feature matching
- Type: Active
- Advantages: Excels in featureless regions
- Disadvantages: Fails outdoors due to reliance on IR spectrum
- Requires precise calibration between projector and sensor
- Type: Active
- Advantages: Precise
- Disadvantages: Often slow for high quality
- Trivial feature matching
- Type: Active
- Advantages: Can handle featureless regions
- Disadvantages: Sensitive to scattering light, fails outdoors
- Uses phase differences to measure distance
- Focal length, principal point, skew, and distortion
- Projects from camera space to screen space
- 6DoF pose (rotation and translation)
- Defines the transformation to global camera space
- Using multiple images to reconstruct 3D structures
- Reconstructing 3D structures from a set of 2D images
- Widely used in practical applications
- Involves:
- Feature extraction (SIFT, SURF)
- Feature matching
- Bundle adjustment
- Optimizing camera poses and 3D point positions to minimize reprojection error
- Using multi-view setups to carve out free space in a voxel grid
- Linear: When the problem can be represented as a linear combination of parameters
- Non-Linear: When the problem involves non-linear functions of parameters
- Conjugate Gradient Descent (CG)
- Preconditioned Conjugate Gradient Descent (PCG)
- Good for large, sparse systems
- QR Decomposition
- LU Decomposition
- Cholesky Decomposition
- SVD (Singular Value Decomposition)
- Good for smaller, dense systems
- Central differences
- Forward differences
- Using tools like Ceres to automatically compute derivatives
- Dual numbers can be implemented using C++ templates
- Minimizing the distance between projected 3D points and their corresponding image locations
- Aligning images based on color and intensity values
- Aligning two point clouds by iteratively finding correspondences and minimizing distance
- Key steps:
- Correspondence finding using nearest neighbor search
- Error metric computation and minimization
- Transformation application and iteration
- Point-to-Point ICP: Minimizes distance between corresponding points
- Point-to-Plane ICP: Minimizes distance from point to tangent plane
- Symmetric ICP: Uses an average normal of both points for improved alignment
- Integrating depth data into a volumetric grid
- The sign indicates in front of (positive) or behind (negative) the surface
- Extending ICP to handle deformations
- Widely used in character animation
- A method for mesh deformation that preserves local rigidity
- Uses cotangent weights in discrete settings
- Solved with alternating optimization ("flip-flop"):
- Fix vertices, solve rotations
- Fix rotations, solve vertices
- Aligning a template mesh to a sequence of depth images in a non-rigid fashion
- Algorithm to extract a polygonal mesh from a volume
- Identifies triangle intersections within a voxel based on sign changes
- Can be parallelized by independently processing each voxel
- A rendering technique used with distance fields
- Steps along a ray and checks for sign changes to detect surface crossings
- Step size can be dynamically adapted using sphere tracing
- Converts point clouds into surfaces by solving a Poisson equation
- Requires a point cloud with normals as input
- The output is an implicit indicator function from which a mesh can be extracted
- Combines frame-to-model and frame-to-frame camera tracking
- Uses SIFT features for global pose estimation
- Can perform loop closures and relocalization
- Uses point-based representations based on surfels (surface elements)
- Updates surfaces and corrects drift
- Methods to describe local image features (SIFT, SURF, ORB, FREAK)
- Good feature descriptors should be scaling, view, and lighting invariant
- FAST (Features from Accelerated Segment Test): Looks for points where a circle of pixels has contiguous pixels brighter or darker than center
- Harris Corner Detector: Based on local gradient changes, analyzing eigenvalues
- Finding corresponding features in different images
- Data structure for efficient nearest neighbor search
- Used for feature matching and correspondence finding in ICP
- Building a map of an environment while simultaneously localizing within it
- Selecting keyframes to reduce computational cost
- Identifying previously visited locations to correct drift
- SLAM using RGB-D cameras, providing both color and depth information
- CNN for camera pose regression from a single RGB image
- Estimates 6DOF camera pose
- Iterative optimization moving in steepest descent direction
- Requires determining appropriate step size
- Second-order optimization using the Hessian matrix
- Faster convergence but computationally expensive
- Approximates Hessian using the Jacobian
- For non-linear least squares problems
- Combines gradient descent and Gauss-Newton
- Uses damping factor to interpolate between methods
- Typically the preferred solver ("LM" in practice)
- Quasi-Newton methods approximating Hessian
- L-BFGS (Limited-memory BFGS) is memory-efficient for large problems
- "L-BFGS is the way to go actually in practice"
- Method for solving optimization problems with robust norms
- Iteratively solves weighted least squares problems
- Simplified representations used to control mesh deformations
- Examples include cages and harmonic coordinates
- Aligning a template mesh to a sequence of depth images
- Aligning two surfaces non-rigidly
- Combining IMU data with visual data to improve tracking accuracy
- Estimating pose by fusing IMU and camera data
- Algorithms for fusing data from multiple sensors
Q: Explain the differences between point clouds and voxel grids in terms of storage and information.
A: Voxel grids require cubic storage, storing values throughout 3D space, whereas point clouds linearly store sampled surface points.
Q: How would you represent a complex, curved surface? What are the advantages and disadvantages of your choice?
A: Parametric surfaces can represent complex, curved surfaces with higher-order continuity. However, they cannot be directly rendered on a GPU and must be converted to a triangle mesh.
Q: Describe a scenario where an implicit surface representation would be preferred over a polygonal mesh.
A: Implicit surfaces are useful when you need to know how close you are to a surface, as they store a floating-point value per voxel indicating the distance to the surface.
Q: Discuss how different triangulations of a surface would affect the solution you find during reconstruction.
A: Different triangulations mean multiple solutions, requiring careful handling in the reconstruction pipeline.
Q: Compare and contrast stereo and structured light techniques for depth sensing.
A: Stereo is passive and works in various environments but is computationally intensive. Structured light is active, good for featureless regions, but limited outdoors.
Q: What are intrinsic and extrinsic camera parameters, and how are they used in 3D reconstruction?
A: Intrinsic parameters define the camera's internal characteristics, while extrinsic parameters define its position and orientation in the world.
Q: How do you correct lens distortion?
A: Polynomial approximations are used, optimized with camera calibration toolchains using checkerboard patterns.
Q: Explain the structure from motion pipeline.
A: It starts with unstructured images, extracts features, matches features across images, performs bundle adjustment, and generates a sparse or dense reconstruction.
Q: What is bundle adjustment, and why is it important in 3D reconstruction?
A: Bundle adjustment refines camera poses and 3D point positions by minimizing the reprojection error, leading to more accurate reconstructions.
Q: What are the limitations of virtual hull carving?
A: It requires expensive, calibrated multi-view setups, is sensitive to noise, and may not produce high-quality results.
Q: When should you use iterative solvers like PCG over direct methods like Cholesky decomposition?
A: Iterative solvers are preferred for large, sparse systems, while direct methods are suitable for smaller, dense systems.
Q: How does automatic differentiation simplify the optimization process?
A: It automates the computation of derivatives, reducing the chance of human error and speeding up development.
Q: Describe the Gauss-Newton method and its application in bundle adjustment.
A: The Gauss-Newton method linearizes the non-linear least squares problem and iteratively refines the solution, widely used in bundle adjustment for camera pose and structure estimation.
Q: Discuss a scenario when different data terms might be more difficult to optimize than others.
A: Photometric alignment, due to its complex and non-convex energy landscape with many local minima, is often more challenging to optimize for compared to reprojection errors or depth terms.
Q: Explain the ICP algorithm and its variations (point-to-point, point-to-plane).
A: ICP iteratively finds closest points between two datasets, computes a transformation, and applies it, with variations differing in error metrics and constraints.
Q: How does non-rigid ICP differ from standard ICP?
A: Non-rigid ICP accounts for deformations by solving for deformation parameters in addition to rigid transformations.
Q: What is ARAP, and how does it preserve local rigidity during mesh deformation?
A: ARAP minimizes the difference between local rotations of mesh elements, preserving the original shape as much as possible while allowing deformation.
Q: Describe the key challenges in non-rigid 3D reconstruction compared to rigid reconstruction.
A: Key challenges include resolving ambiguities between surface motion and deformation and needing effective priors to guide the reconstruction.
Q: Explain the marching cubes algorithm and its applications.
A: Marching Cubes is used to create a triangle mesh from a volume by interpolating the intersections based on whether cube vertices are inside or outside the surface.
Q: How does ray marching work, and what are its advantages?
A: Ray marching steps along a ray to detect surface crossings by checking for sign changes. The step size can be dynamically adapted using sphere tracing to accelerate the process.
Q: What is Poisson surface reconstruction, and when is it useful?
A: Poisson surface reconstruction converts point clouds into surfaces by solving a Poisson equation, requiring a point cloud with normals as input.
Q: Describe the key components of Bundle Fusion and how it achieves real-time performance.
A: Bundle Fusion combines frame-to-model and frame-to-frame camera tracking with global RGB bundling, using SIFT features for global pose estimation.
Q: What makes a good feature descriptor?
A: A good feature descriptor should be scaling, view, and lighting invariant.
Q: Explain how KD-trees are used in feature matching.
A: KD-trees are used to efficiently find the nearest neighbors in feature space, which speeds up the matching process.
Q: Explain the concept of loop closure in SLAM.
A: Loop closure involves identifying previously visited locations to correct accumulated drift in the map.
Q: How does RGB-D SLAM differ from traditional SLAM?
A: RGB-D SLAM uses RGB-D cameras, providing both color and depth information, which can improve the accuracy and robustness of the SLAM system.
Q: Explain the difference between gradient descent and Newton's method.
A: Gradient descent uses the first derivative (gradient) to update parameters, while Newton's method uses the second derivative (Hessian) for faster convergence.
Q: What is the Levenberg-Marquardt algorithm, and why is it useful?
A: The Levenberg-Marquardt algorithm combines gradient descent and the Gauss-Newton method, using a damping factor to interpolate between the two methods.
Q: Describe the BFGS and L-BFGS algorithms.
A: BFGS is a quasi-Newton method that approximates the Hessian matrix, while L-BFGS is a memory-efficient version for large problems.
Q: What is iteratively reweighted least squares (IRLS), and when is it used?
A: IRLS is a method for solving optimization problems with robust norms, iteratively solving a weighted least squares problem.
Q: What are deformation proxies, and how are they used in non-rigid deformation?
A: Deformation proxies are simplified representations used to control mesh deformations, such as cages and harmonic coordinates.
Q: Explain how ARAP regularization preserves local rigidity during deformation.
A: ARAP regularization minimizes the difference between local rotations of mesh elements, preserving the original shape as much as possible while allowing deformation.
Q: How does non-rigid ICP work?
A: Non-rigid ICP extends the standard ICP algorithm to account for deformations by solving for deformation parameters in addition to rigid transformations.
Q: What are the challenges of tracking deformable objects?
A: Challenges include resolving ambiguities between surface motion and deformation, and the need for effective priors to guide the reconstruction.
Q: How can IMU data be used to improve tracking accuracy in visual SLAM?
A: IMU data provides high-frequency measurements of acceleration and angular velocity, which can be fused with visual data to improve tracking accuracy.
Q: Explain the concept of visual-inertial odometry (VIO).
A: Visual-inertial odometry (VIO) estimates the pose of a device by fusing IMU and camera data.
Q: Implement a basic ICP algorithm.
A: This involves writing code for nearest neighbor search, transformation estimation, and iterative refinement.
Q: Design a system for real-time 3D reconstruction using Kinect.
A: Select appropriate algorithms for tracking, reconstruction, and rendering, and optimize performance for real-time operation.
Q: Write code to estimate normals from a point cloud using PCA.
A: Implement PCA on the neighborhood of each point and extract the normal from the eigenvector corresponding to the smallest eigenvalue.
Q: How would you parallelize conjugate gradient updates on GPU?
A: Think through mapping the algorithm to GPU architecture, considering memory access patterns and thread synchronization.