Evaluating on KITTI Improved Ground Truth

Hi, First of all, congratulations on this great work!

I'm evaluating recent Depth Estimation techniques and I'm wondering if you could help me to validate the results.

I downloaded your [SwinLarge predictions](https://dl.cv.ethz.ch/idisc/predictions/kitti_swinlarge.tar) and wanted to compare them with the KITTI Improved Ground Truth [1] directly by comparing your output map with the GT.

I followed your instructions by dividing by 256 (as the GT data), and I interpolated just like your code do on the output of the model, using F.interpolate with mode=bicubic and align_corners=True.

I'm following Monodepth2 procedures to compare, therefore not using Garg's crop in here.

The results are the following:

   abs_rel |   sq_rel |     rmse | rmse_log |       a1 |       a2 |       a3 | 
&   0.086  &   0.539  &   4.228  &   0.153  &   0.913  &   0.979  &   0.991  \\

I was expecting really lower results. Can you validate these steps, please? Are the SwinLarge predictions giving the correct outcome?

The code is quite simple, and I'll share it above here just so you can check it (if you want).


`   
  
    def compute_errors(gt, pred):
        thresh = np.maximum((gt / pred), (pred / gt))
        a1 = (thresh < 1.25     ).mean()
        a2 = (thresh < 1.25 ** 2).mean()
        a3 = (thresh < 1.25 ** 3).mean()
        rmse = (gt - pred) ** 2
        rmse = np.sqrt(rmse.mean())
    
        rmse_log = (np.log(gt) - np.log(pred)) ** 2
        rmse_log = np.sqrt(rmse_log.mean())
    
        abs_rel = np.mean(np.abs(gt - pred) / gt)
    
        sq_rel = np.mean(((gt - pred) ** 2) / gt)
    
        return abs_rel, sq_rel, rmse, rmse_log, a1, a2, a3
    MIN_DEPTH = 1e-3
    MAX_DEPTH = 80
    pred = cv2.imread(pred_path, -1)
    pred = pred / 256
    gt = cv2.imread(gt_path, -1)
    gt_depth = gt / 256
    gt_height, gt_width = gt_depth.shape[:2]
    mask = np.logical_and(gt_depth > MIN_DEPTH, gt_depth < MAX_DEPTH)    
    pred_depth = F.interpolate(
                torch.from_numpy(pred).unsqueeze(0).unsqueeze(0),
                gt.shape,
                mode="bicubic",
                align_corners=True,
            )            
    pred_depth[pred_depth < MIN_DEPTH] = MIN_DEPTH
    pred_depth[pred_depth > MAX_DEPTH] = MAX_DEPTH
    compute_errors(gt_depth, pred_depth)
`


I was expecting lower values than what you provided in the paper (like... Abs Rel probably lower than 0.05) but actually got way higher values (like... Abs Rel 0.086).

Thanks again for your work!

Ref.

[Uhrig, Jonas, et al. "Sparsity invariant cnns." 2017 international conference on 3D Vision (3DV). IEEE, 2017.]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evaluating on KITTI Improved Ground Truth #21

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluating on KITTI Improved Ground Truth #21

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions