Skip to content

Performance of different frond-ends #754

@sarlinpe

Description

@sarlinpe

Hi folks,

I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.

  1. The point clouds of SP+SG/LG look pretty sparse on several datasets, so do the matches in fig 3.

the shorter image side is resized to at most 760 pixels in length

So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.

A maximum of 5000 keypoints are used for each of the following front-ends

Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?

self._config = {"weights_path": weights_path}
self._model = SuperPoint(self._config).eval()

Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:

  • decrease the detection threshold: keypoint_threshold=0.001
  • decrease the NMS radius: nms_radius=3
  • if images are smaller than the limit (760px), upsample them

This should make SuperPoint competitive with SIFT in terms of keypoint detection.

  1. We do know that these deep matchers are more easily tricked by symmetries, as you point out in fig 3. This seems confirmed by table 3: compared to SIFT, the mean of the front-end errors is much higher than their median and they have many more VG outliers, especially on South Building and Crane.
  • Did you try tuning the filtering threshold (minimum number of inliers, cycle consistency) for each front-end? 15 and 7° seem pretty loose for front-ends that have a high recall.
  • Did you try running the averaging+BA on edges that are inliers according to the GT poses?
  • It seems that the motion averaging does not have any robustness built-in. Zhang et al. (ICCV 2023) show that using a robust cost function is critical (table 5) and that weighting by inlier count or two-view covariance can often help. Did you try this? This paper actually shows that SuperPoint+SuperGlue can work perfectly fine for global SfM.

Thanks!
cc @Phil26AT @ducha-aiki

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions