-
Notifications
You must be signed in to change notification settings - Fork 59
Description
Hi folks,
I've read your GTSfM paper - nice work, thanks for pushing this to arxiv. I enjoyed reading it and appreciate the huge effort that went into building it. I am very surprised by the conclusion that SuperPoint+Super/LightGlue is not as good as SIFT - in fact we've always observed the exact opposite with incremental SfM (COLMAP) on different easy and difficult datasets (ETH3D, IMC 2020/1/2/3). I went through the code but didn't find anything obvious.
- The point clouds of SP+SG/LG look pretty sparse on several datasets, so do the matches in fig 3.
the shorter image side is resized to at most 760 pixels in length
So that'd give a 1351x760 px image for a 1920×1080 input - this seems fine.
A maximum of 5000 keypoints are used for each of the following front-ends
Do you know how many points are effectively extracted by SuperPoint per image? How often is the limit of 5k hit compared to SIFT?
gtsfm/gtsfm/frontend/detector_descriptor/superpoint.py
Lines 45 to 46 in 1b55b76
| self._config = {"weights_path": weights_path} | |
| self._model = SuperPoint(self._config).eval() |
Do I understand correctly that you use the default settings? Did you try to tweak them? As is, it cannot return 5k keypoints on these kinds of images, unlike SIFT. I recommend the following:
- decrease the detection threshold:
keypoint_threshold=0.001 - decrease the NMS radius:
nms_radius=3 - if images are smaller than the limit (760px), upsample them
This should make SuperPoint competitive with SIFT in terms of keypoint detection.
- We do know that these deep matchers are more easily tricked by symmetries, as you point out in fig 3. This seems confirmed by table 3: compared to SIFT, the mean of the front-end errors is much higher than their median and they have many more VG outliers, especially on South Building and Crane.
- Did you try tuning the filtering threshold (minimum number of inliers, cycle consistency) for each front-end? 15 and 7° seem pretty loose for front-ends that have a high recall.
- Did you try running the averaging+BA on edges that are inliers according to the GT poses?
- It seems that the motion averaging does not have any robustness built-in. Zhang et al. (ICCV 2023) show that using a robust cost function is critical (table 5) and that weighting by inlier count or two-view covariance can often help. Did you try this? This paper actually shows that SuperPoint+SuperGlue can work perfectly fine for global SfM.
Thanks!
cc @Phil26AT @ducha-aiki