Add order statistic diagnostic for `compare()` #237

jordandeklerk · 2025-10-30T19:38:22Z

This implements the order statistic diagnostic for detecting selection induced bias when comparing many models in compare(). When more than 11 models are compared using full PSIS-LOO-CV, the diagnostic estimates whether the observed performance difference could be due to chance by comparing the best model's ELPD difference against the expected maximum order statistic under a null hypothesis of equal performance.

This diagnostic is not done for the subsampling case. As far as I can tell, it isn't done here either https://github.com/stan-dev/loo/blob/d6fe380161fcd3ba07065ce0a525146abdb2c1d7/R/loo_compare.psis_loo_ss_list.R. I'm not exactly sure why this is, but my guess is that the the theoretical assumptions underlying the test don't account for the additional variance and approximation bias introduced by subsampling, making it unclear whether the null distribution would be properly calibrated in that setting.

Resolves #234

src/arviz_stats/loo/compare.py

jordandeklerk · 2025-10-30T19:39:17Z

src/arviz_stats/loo/compare.py

+    if candidate_sd == 0 or not np.isfinite(candidate_sd):
+        warnings.warn(
+            "All models have nearly identical performance.",
+            UserWarning,
+        )
+        return


Not sure if we want to add more to this warning? This is mostly to avoid the Scipy runtime error that will happen if sd=0. Probably not very realistic in practice, but theoretically possible.

codecov-commenter · 2025-10-30T19:39:56Z

Codecov Report

❌ Patch coverage is 96.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.78%. Comparing base (d6c4f58) to head (7c9560e).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/arviz_stats/loo/compare.py	96.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #237      +/-   ##
==========================================
+ Coverage   84.72%   84.78%   +0.05%     
==========================================
  Files          41       41              
  Lines        4950     4974      +24     
==========================================
+ Hits         4194     4217      +23     
- Misses        756      757       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

read-the-docs-community · 2025-10-30T19:40:42Z

Documentation build overview

📚 arviz-stats | 🛠️ Build #30151928 | 📁 Comparing 7c9560e against latest (85cca24)

🔍 Preview build

Show files changed (12 files in total): 📝 12 modified | ➕ 0 added | ➖ 0 deleted

File	Status
_modules/arviz_stats/sampling_diagnostics.html	📝 modified
api/generated/arviz_stats.compare.html	📝 modified
api/generated/arviz_stats.eti.html	📝 modified
api/generated/arviz_stats.hdi.html	📝 modified
api/generated/arviz_stats.histogram.html	📝 modified
api/generated/arviz_stats.kde.html	📝 modified
api/generated/arviz_stats.loo_kfold.html	📝 modified
api/generated/arviz_stats.mode.html	📝 modified
api/generated/arviz_stats.qds.html	📝 modified
api/generated/arviz_stats.rhat.html	📝 modified
_modules/arviz_stats/loo/compare.html	📝 modified
_modules/arviz_stats/loo/loo_moment_match.html	📝 modified

src/arviz_stats/loo/compare.py

feat: add order-statistic check for model comparison

ea9f2c0

jordandeklerk commented Oct 30, 2025

View reviewed changes

src/arviz_stats/loo/compare.py Outdated Show resolved Hide resolved

jordandeklerk commented Oct 30, 2025

View reviewed changes

jordandeklerk marked this pull request as ready for review October 30, 2025 19:48

jordandeklerk requested a review from aloctavodia October 30, 2025 19:48

aloctavodia reviewed Oct 31, 2025

View reviewed changes

src/arviz_stats/loo/compare.py Show resolved Hide resolved

docs: make diagnostic check less technical

7c9560e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add order statistic diagnostic for `compare()` #237

Add order statistic diagnostic for `compare()` #237

Uh oh!

jordandeklerk commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

jordandeklerk Oct 30, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Oct 30, 2025 •

edited

Loading

Uh oh!

read-the-docs-community bot commented Oct 30, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add order statistic diagnostic for compare() #237

Are you sure you want to change the base?

Add order statistic diagnostic for compare() #237

Uh oh!

Conversation

jordandeklerk commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

jordandeklerk Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

read-the-docs-community bot commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Documentation build overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add order statistic diagnostic for `compare()` #237

Add order statistic diagnostic for `compare()` #237

jordandeklerk commented Oct 30, 2025 •

edited

Loading

jordandeklerk Oct 30, 2025 •

edited

Loading

codecov-commenter commented Oct 30, 2025 •

edited

Loading

read-the-docs-community bot commented Oct 30, 2025 •

edited

Loading