-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
Ultralytics provides a well-established framework for object detection evaluation. After comparing the results reported by Ultralytics and DetectionMetrics for a YOLO model, we noticed some differences that we believe should be updated on our side:
- Confidence threshold: When computing mAP or PR curves, there should be no confidence threshold imposed. This way, we can keep all predicted boxes and generate results across the full range of possible confidence values. Currently, the confidence threshold in the model configuration file is applied right after inference, and metrics are then computed only on boxes above that threshold.
- Precision, recall, and confusion matrix: For these metrics, we could have a confidence threshold imposed by the user (the current approach), or automatically use the one that maximizes the F1 score (not implemented yet). This way, we would be reporting the best possible performance, while also informing users of the optimal confidence threshold for their models.
- Confusion matrices: Ultralytics adds an imaginary background class. This means that if a predicted or target box is not matched during evaluation, it is accumulated as predicted (any class/background) or vice versa.
The details above are not clearly stated in the YOLO docs, but we found them after debugging and inspecting their code. That said, their docs do describe all reported metrics and might be useful:
https://docs.ultralytics.com/guides/yolo-performance-metrics/
Metadata
Metadata
Assignees
Labels
No labels