Add empirical coverage diagnostic plots #579

lasse-meixner · 2025-09-23T11:41:18Z

Resolves #577.

Description

Creates coverage plots showing empirical coverage of posterior credible intervals.
A well-calibrated model would have coverage exactly match interval width (i.e. 95% credible interval contains the true value 95% of the time) as shown by a diagonal line.

The coverage for the provided simulated datasets is accompanied by credible intervals for coverage proportion (in form of a gray ribbon). These are calculated via the standard (conjugate) Beta-Binomial model for binomial proportions with a uniform prior.

Changes

adds module bayesflow.diagnostics.plots.coverage implementing coverage plot and coverage diff plot
adds auxiliary function compute_empirical_coverage in bayesflow.utils.plot_utils

Example
Here an example of a model that is "too uncertain", or too conservative. It over-covers:

stefanradev93 · 2025-09-23T15:42:19Z

Hi Lasse, I agree that this would be a very nice feature (which we implicitly care about when we compute the expected calibration error / ECE metric). And thank you for the PR, which I will review asap.

stefanradev93 · 2025-09-23T23:43:26Z

Hi Lasse, thanks for the PR and for following all contribution guidelines! I think the PR is almost merge-ready up to a small discussion on the following point:

Since the two new functions coverage and coverage_diff share a lot of code and logic, I wonder if it would make sense to merge them into a single function with an optional difference boolean flag, akin to what we have for the ECDF plots?

Also, do you happen to have a reference for the beta formulation of the coverage confidence bands which we can include in the docstring?

…al reference

lasse-meixner · 2025-09-24T11:09:11Z

@stefanradev93 sure!
I agree with your first point, I merged the two as suggested. Regarding a source for the beta-binomial model I added a docstring reference to Chapter 2 of `Bayesian Data Analysis (2013, 3rd ed.) by Gelman A., et al. where this is covered.

codecov · 2025-09-24T13:59:40Z

Codecov Report

❌ Patch coverage is 93.58974% with 5 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
bayesflow/utils/plot_utils.py	89.18%	4 Missing ⚠️
bayesflow/diagnostics/plots/coverage.py	97.50%	1 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/diagnostics/plots/__init__.py	`100.00% <100.00%> (ø)`
bayesflow/utils/__init__.py	`100.00% <ø> (ø)`
bayesflow/diagnostics/plots/coverage.py	`97.50% <97.50%> (ø)`
bayesflow/utils/plot_utils.py	`76.47% <89.18%> (+3.53%)`	⬆️

... and 5 files with indirect coverage changes

add empirical coverage diagnostic plots

7430d8c

stefanradev93 self-assigned this Sep 23, 2025

merging the plotting functions via difference arg and add beta binomi…

bbf3da7

…al reference

adjust tests and diagnostics init

c956778

stefanradev93 merged commit 8ef46f6 into bayesflow-org:dev Sep 24, 2025
9 checks passed

stefanradev93 mentioned this pull request Oct 1, 2025

Include both coverage and ECDF in default diagnostics #583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add empirical coverage diagnostic plots #579

Add empirical coverage diagnostic plots #579

Uh oh!

lasse-meixner commented Sep 23, 2025

Uh oh!

stefanradev93 commented Sep 23, 2025

Uh oh!

stefanradev93 commented Sep 23, 2025

Uh oh!

lasse-meixner commented Sep 24, 2025

Uh oh!

codecov bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add empirical coverage diagnostic plots #579

Add empirical coverage diagnostic plots #579

Uh oh!

Conversation

lasse-meixner commented Sep 23, 2025

Uh oh!

stefanradev93 commented Sep 23, 2025

Uh oh!

stefanradev93 commented Sep 23, 2025

Uh oh!

lasse-meixner commented Sep 24, 2025

Uh oh!

codecov bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Sep 24, 2025 •

edited

Loading