Skip to content

Object detection evaluation improvements (inspired by YOLO evaluation) #322

@dpascualhe

Description

@dpascualhe

Ultralytics provides a well-established framework for object detection evaluation. After comparing the results reported by Ultralytics and DetectionMetrics for a YOLO model, we noticed some differences that we believe should be updated on our side:

  • Confidence threshold: When computing mAP or PR curves, there should be no confidence threshold imposed. This way, we can keep all predicted boxes and generate results across the full range of possible confidence values. Currently, the confidence threshold in the model configuration file is applied right after inference, and metrics are then computed only on boxes above that threshold.
  • Precision, recall, and confusion matrix: For these metrics, we could have a confidence threshold imposed by the user (the current approach), or automatically use the one that maximizes the F1 score (not implemented yet). This way, we would be reporting the best possible performance, while also informing users of the optimal confidence threshold for their models.
  • Confusion matrices: Ultralytics adds an imaginary background class. This means that if a predicted or target box is not matched during evaluation, it is accumulated as predicted (any class/background) or vice versa.

The details above are not clearly stated in the YOLO docs, but we found them after debugging and inspecting their code. That said, their docs do describe all reported metrics and might be useful:
https://docs.ultralytics.com/guides/yolo-performance-metrics/

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions