-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hi, thanks for the great benchmark!
I noticed that some methods show negative mean-squared error (MSE) scores. Since MSE should be non-negative, could you clarify how this is possible? Is there a normalization or scaling step involved?
Thanks for your work on this project!
Notes: I initially oversaw that this explanation was already stated on the website.
"Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets."
It might be helpful to clarify this in the interface by adjusting the column label to “Scaled score (higher is better)” or adding a short tooltip/footnote, since the current presentation could be confusing to readers unfamiliar with the scaling.