Object detection evaluation improvements (inspired by YOLO evaluation)

Ultralytics provides a well-established framework for object detection evaluation. After comparing the results reported by Ultralytics and DetectionMetrics for a YOLO model, we noticed some differences that we believe should be updated on our side:

- **Confidence threshold**: When computing mAP or PR curves, there should be no confidence threshold imposed. This way, we can keep all predicted boxes and generate results across the full range of possible confidence values. Currently, the confidence threshold in the model configuration file is applied right after inference, and metrics are then computed only on boxes above that threshold.  
- **Precision, recall, and confusion matrix**: For these metrics, we could have a confidence threshold imposed by the user (the current approach), or automatically use the one that **maximizes the F1 score** (not implemented yet). This way, we would be reporting the best possible performance, while also informing users of the optimal confidence threshold for their models.  
- **Confusion matrices**: Ultralytics adds an **imaginary background class**. This means that if a predicted or target box is not matched during evaluation, it is accumulated as predicted (any class/background) or vice versa.  

The details above are not clearly stated in the YOLO docs, but we found them after debugging and inspecting their code. That said, their docs do describe all reported metrics and might be useful:  
https://docs.ultralytics.com/guides/yolo-performance-metrics/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Object detection evaluation improvements (inspired by YOLO evaluation) #322

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Object detection evaluation improvements (inspired by YOLO evaluation) #322

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions