-
-
Notifications
You must be signed in to change notification settings - Fork 9
Add order statistic diagnostic for compare()
#237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #237 +/- ##
==========================================
+ Coverage 84.72% 84.77% +0.05%
==========================================
Files 41 41
Lines 4950 4973 +23
==========================================
+ Hits 4194 4216 +22
- Misses 756 757 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Removed a test that is essentially testing the same thing as the first order stat test. With the |
This implements the order statistic diagnostic for detecting selection induced bias when comparing many models in
compare(). When more than 11 models are compared using full PSIS-LOO-CV, the diagnostic estimates whether the observed performance difference could be due to chance by comparing the best model's ELPD difference against the expected maximum order statistic under a null hypothesis of equal performance.This diagnostic is not done for the subsampling case. As far as I can tell, it isn't done here either https://github.com/stan-dev/loo/blob/d6fe380161fcd3ba07065ce0a525146abdb2c1d7/R/loo_compare.psis_loo_ss_list.R. I'm not exactly sure why this is, but my guess is that the the theoretical assumptions underlying the test don't account for the additional variance and approximation bias introduced by subsampling, making it unclear whether the null distribution would be properly calibrated in that setting.
Resolves #234