-
Notifications
You must be signed in to change notification settings - Fork 65
Description
🐛 Bug
It seems that the evaluation pipeline computes a wrong precision. When the evaluated detector never predicts a category C (e.g. with a closed set detector with less classes than in LVIS), the current implementation will compute a precision of 0 for that category, while it should ignore it (no prediction = no precision).
To Reproduce
Lines 370 to 405 in 7d7f07d
| for iou_thr_idx, (tp, fp) in enumerate(zip(tp_sum, fp_sum)): | |
| tp = np.array(tp) | |
| fp = np.array(fp) | |
| num_tp = len(tp) | |
| rc = tp / num_gt | |
| if num_tp: | |
| recall[iou_thr_idx, cat_idx, area_idx] = rc[ | |
| -1 | |
| ] | |
| else: | |
| recall[iou_thr_idx, cat_idx, area_idx] = 0 | |
| # np.spacing(1) ~= eps | |
| pr = tp / (fp + tp + np.spacing(1)) | |
| pr = pr.tolist() | |
| # Replace each precision value with the maximum precision | |
| # value to the right of that recall level. This ensures | |
| # that the calculated AP value will be less suspectable | |
| # to small variations in the ranking. | |
| for i in range(num_tp - 1, 0, -1): | |
| if pr[i] > pr[i - 1]: | |
| pr[i - 1] = pr[i] | |
| rec_thrs_insert_idx = np.searchsorted( | |
| rc, self.params.rec_thrs, side="left" | |
| ) | |
| pr_at_recall = [0.0] * num_recalls | |
| try: | |
| for _idx, pr_idx in enumerate(rec_thrs_insert_idx): | |
| pr_at_recall[_idx] = pr[pr_idx] | |
| except: | |
| pass | |
| precision[iou_thr_idx, :, cat_idx, area_idx] = np.array(pr_at_recall) |
This code snippet is the one computing the precision at all recall thresholds. When the current category (with index cat_idx) has never been predicted in the dataset, tp_sum and fp_sum will have a shape of [num_IoU_thresholds, 0], because the second dimension has the length of the number of predictions for the current category in the total dataset.
In this scenario, the try/except statement
Line 400 in 7d7f07d
| try: |
pr will be empty. Then the precision for that category will be set to the default (defined in Line 398 in 7d7f07d
| pr_at_recall = [0.0] * num_recalls |
Expected behavior
I would expect the precision to remain at -1 (i.e. ignored in the final computation of the precision) in this scenario, because the detector has not predicted the class at all, so it is unfair to receive a precision of 0.
Proposed fix
A simple fix would be to do the following in l.375-380:
if num_tp:
recall[iou_thr_idx, cat_idx, area_idx] = rc[
-1
]
else:
recall[iou_thr_idx, cat_idx, area_idx] = 0
# If there are no detection for that category, the precision is undefined.
continue
If num_tp = len(tp) = 0, this means that there were no detections for that category, so this is exactly the scenario I am describing here. In this case, the recall is 0, and we stop here without computing the precision, which will stay at its defaults of -1.
Let me know what you think of this finding, or if I made a mistake in my reasoning.