- 
          
 - 
                Notifications
    
You must be signed in to change notification settings  - Fork 9
 
Open
Description
We should implement an order statistics-based diagnostic check in compare() like this https://github.com/stan-dev/loo/blob/d6fe380161fcd3ba07065ce0a525146abdb2c1d7/R/loo_compare.R#L300 to warn users when model differences may be due to chance rather than genuine predictive performance differences.
From this paper https://arxiv.org/pdf/2309.03742, when comparing many models using cross-validation, the "best" model can appear better simply due to random variation in ELPD estimates. The more models compared, the higher the probability that one will randomly score highest.
OriolAbril and aloctavodia
Metadata
Metadata
Assignees
Labels
No labels