Skip to content
Discussion options

You must be logged in to vote

Hi, thanks for your question. This sounds like an interesting use-case.


Why is the F1 score so good yet the AUROC so low?

In Anomalib, we evaluate the F1 score at the optimal threshold value. This means that the displayed F1 score is the best F1 score that you can get by varying the threshold applied to the model's raw predictions. If the AUROC score is low while the F1 score is high, this means that the performance of the model can be very good for the specific, optimal, threshold value, but the model performs poorly at most other threshold values. To get a better idea of the threshold-dependent behaviour of your model, you could visually inspect the ROC curve, which is generated auto…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@electro020
Comment options

@electro020
Comment options

Answer selected by electro020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants