Potential wrong computation of the precision

## 🐛 Bug

It seems that the evaluation pipeline computes a wrong precision. When the evaluated detector never predicts a category C (e.g. with a closed set detector with less classes than in LVIS), the current implementation will compute a precision of 0 for that category, while it should ignore it (no prediction = no precision).

## To Reproduce

https://github.com/lvis-dataset/lvis-api/blob/7d7f07def11da91f8b2710ce352c62a78fd5a7ad/lvis/eval.py#L370-L405

This code snippet is the one computing the precision at all recall thresholds. When the current category (with index `cat_idx`) has never been predicted in the dataset, `tp_sum` and `fp_sum` will have a shape of `[num_IoU_thresholds, 0]`, because the second dimension has the length of the number of predictions for the current category in the total dataset.

In this scenario, the try/except statement https://github.com/lvis-dataset/lvis-api/blob/7d7f07def11da91f8b2710ce352c62a78fd5a7ad/lvis/eval.py#L400 will fail, because `pr` will be empty. Then the precision for that category will be set to the default (defined in https://github.com/lvis-dataset/lvis-api/blob/7d7f07def11da91f8b2710ce352c62a78fd5a7ad/lvis/eval.py#L398) of 0.

## Expected behavior

I would expect the precision to remain at -1 (i.e. ignored in the final computation of the precision) in this scenario, because the detector has not predicted the class at all, so it is unfair to receive a precision of 0.

## Proposed fix

A simple fix would be to do the following in l.375-380:
```
                    if num_tp:
                        recall[iou_thr_idx, cat_idx, area_idx] = rc[
                            -1
                        ]
                    else:
                        recall[iou_thr_idx, cat_idx, area_idx] = 0
                        # If there are no detection for that category, the precision is undefined.
                        continue
```
If `num_tp = len(tp) = 0`, this means that there were no detections for that category, so this is exactly the scenario I am describing here. In this case, the recall is 0, and we stop here without computing the precision, which will stay at its defaults of -1.


Let me know what you think of this finding, or if I made a mistake in my reasoning.

	for iou_thr_idx, (tp, fp) in enumerate(zip(tp_sum, fp_sum)):
	tp = np.array(tp)
	fp = np.array(fp)
	num_tp = len(tp)
	rc = tp / num_gt
	if num_tp:
	recall[iou_thr_idx, cat_idx, area_idx] = rc[
	-1
	]
	else:
	recall[iou_thr_idx, cat_idx, area_idx] = 0

	# np.spacing(1) ~= eps
	pr = tp / (fp + tp + np.spacing(1))
	pr = pr.tolist()

	# Replace each precision value with the maximum precision
	# value to the right of that recall level. This ensures
	# that the calculated AP value will be less suspectable
	# to small variations in the ranking.
	for i in range(num_tp - 1, 0, -1):
	if pr[i] > pr[i - 1]:
	pr[i - 1] = pr[i]

	rec_thrs_insert_idx = np.searchsorted(
	rc, self.params.rec_thrs, side="left"
	)

	pr_at_recall = [0.0] * num_recalls

	try:
	for _idx, pr_idx in enumerate(rec_thrs_insert_idx):
	pr_at_recall[_idx] = pr[pr_idx]
	except:
	pass
	precision[iou_thr_idx, :, cat_idx, area_idx] = np.array(pr_at_recall)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential wrong computation of the precision #44

🐛 Bug

To Reproduce

Expected behavior

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Potential wrong computation of the precision #44

Description

🐛 Bug

To Reproduce

Expected behavior

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions