Add Pareto shape plot #356

jordandeklerk · 2025-10-21T03:23:48Z

Closes #353

from arviz_plots import plot_khat
from arviz_base import load_arviz_data
from arviz_stats import loo

dt = load_arviz_data("radon")
elpd_data = loo(dt, pointwise=True)
plot_khat(elpd_data)

plot_khat(elpd_data, show_bins=True, show_hlines=True)

📚 Documentation preview 📚: https://arviz-plots--356.org.readthedocs.build/en/356/

jordandeklerk · 2025-10-21T03:26:03Z

src/arviz_plots/plots/utils.py

Not sure if the new helpers here should be in the public API or not. Not too sure how frequently these will be used outside of the Pareto shape plot.

codecov-commenter · 2025-10-21T03:40:14Z

Codecov Report

❌ Patch coverage is 66.90141% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.30%. Comparing base (39bf455) to head (706cddb).

Files with missing lines	Patch %	Lines
src/arviz_plots/plots/utils.py	27.83%	70 Missing ⚠️
src/arviz_plots/plots/khat_plot.py	88.13%	21 Missing ⚠️
src/arviz_plots/visuals/__init__.py	62.50%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #356      +/-   ##
==========================================
- Coverage   86.14%   85.30%   -0.85%     
==========================================
  Files          54       55       +1     
  Lines        6078     6361     +283     
==========================================
+ Hits         5236     5426     +190     
- Misses        842      935      +93

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

OriolAbril

I have to leave now, will continue the review tomorrow hopefully it is somewhat helpful already

src/arviz_plots/plots/khat_plot.py

OriolAbril · 2025-10-24T16:20:46Z

src/arviz_plots/plots/khat_plot.py

+        if isinstance(color, str) and color in distribution.dims:
+            pc_kwargs["aes"].setdefault("color", [color])
+            color = None


I will check other plots to see how we handle similar situations, I don't think we want to use setdefault here as it would mean pc_kwargs takes priority over color argument

src/arviz_plots/plots/khat_plot.py

jordandeklerk · 2025-10-24T16:55:54Z

I have to leave now, will continue the review tomorrow hopefully it is somewhat helpful already

Very useful, thank you! I will need to re-work somethings you've mentioned already and some that @aloctavodia mentioned as well over slack. Hoping to get these changes in early next week!

jordandeklerk · 2025-10-26T17:04:25Z

src/arviz_plots/plots/utils.py

    return plot_collection
+
+
+def format_coords_as_labels(data, skip_dims=None):


This is the same formatting util from legacy ArviZ. Do we need to adpat this in any way for arviz-plots? This seems to work as it is now.

I think ideally we'd take a labeller argument and use sel_to_str (I think it would be the adequate method for this particular case). That will allow users to configure the generated labels a bit (for instance with substitutions, using indexes instead/in addition to coordinate values) and I think the default case will stay the same.

jordandeklerk · 2025-10-26T17:06:59Z

src/arviz_plots/plots/utils.py

+    return np.array([f"{s}" for s in coord_labels], dtype=object)
+
+
+def calculate_khat_bin_edges(values, thresholds, tolerance=1e-9):


Should this live in arviz-stats since it's computational? Could go here if we need to move it https://github.com/arviz-devs/arviz-stats/blob/main/src/arviz_stats/utils.py.

I think it is ok to leave here unless we are basically using the same thing already to generate the repr of the ELPDData class in which case it would be nice to use the same thing in both places.

jordandeklerk · 2025-10-26T17:12:23Z

src/arviz_plots/backend/matplotlib/core.py

 def hline(y, target, *, color=unset, alpha=unset, width=unset, linestyle=unset, **artist_kws):
    """Interface to matplotlib for a horizontal line spanning the whole axes."""
-    artist_kws.setdefault("zorder", 0)
+    artist_kws.setdefault("zorder", 3)


Made this change to place the horizontal lines in front and khat values in the back.

We should set zorder to 2 in order to define what goes on top of what with the plotting order. We must have missed that zorder when reviewing the addition of hline, we do have a note about it: https://github.com/arviz-devs/arviz-plots/blob/main/src/arviz_plots/backend/matplotlib/core.py#L4. Otherwise this will only work for matplotlib, and you can see in the preview that as the lines are plotted last, bokeh and plotly already render them on top: https://arviz-plots--356.org.readthedocs.build/en/356/gallery/plot_khat.html

As an example of the opposite behaviour, when we add the the alternate row shading in plot_forest we plot the gray stripes first, then all the other elements so the stripes end up at the back in all backends.

OriolAbril

I am a bit unsure about the things we'd want to support and the things we don't want to allow. There are several things we could integrate more with plotcollection machinery which would give more flexibility to the plot, but plotcollection also usually gives more flexibility than what is really needed which can end up meaning more work and less clear behaviour.

OriolAbril · 2025-10-29T11:35:07Z

src/arviz_plots/plots/khat_plot.py

+        default_color = khat_kwargs.get("color", color)
+
+        if default_color is None and "color" not in khat_aes:
+            default_color = "C0"
+
+        if "color" not in khat_aes and default_color is not None:
+            khat_kwargs.setdefault("color", default_color)


Suggested change

default_color = khat_kwargs.get("color", color)

if default_color is None and "color" not in khat_aes:

default_color = "C0"

if "color" not in khat_aes and default_color is not None:

khat_kwargs.setdefault("color", default_color)

if "color" not in khat_aes:

khat_kwargs.setdefault("color", "C0")

I think this is equivalent with the exception of visuals={"khat": {"color": None}} which I think is ok to let it fail

OriolAbril · 2025-10-29T11:37:13Z

src/arviz_plots/plots/khat_plot.py

+    show_hlines=False,
+    show_bins=False,
+    hover_label=False,


not completely sure if we want to keep these as top level arguments or defer to visuals={"xyz": True/False} to activate or deactivate them

OriolAbril · 2025-10-29T11:43:24Z

src/arviz_plots/plots/khat_plot.py

+            for idx, value in enumerate(hline_values):
+                h_kwargs = hlines_kwargs.copy()
+                if "linestyle" not in hlines_aes:
+                    h_kwargs.setdefault("linestyle", f"C{idx}")
+                if "color" not in hlines_aes:
+                    h_kwargs.setdefault("color", f"C{idx + 1}")
+                if "alpha" not in hlines_aes:
+                    h_kwargs.setdefault("alpha", 0.7)
+
+                h_ds = xr.Dataset({"pareto_k": xr.DataArray(value)})
+                plot_collection.map(
+                    hline,
+                    f"hline_{idx}",
+                    data=h_ds,
+                    ignore_aes="all",
+                    **h_kwargs,
+                )


we should define a dataarray/dataset with a bin or hline related dimension (kind of like a concatenated version of the h_ds ones). Then add an overlay aesthetic to it. Otherwise if we manually loop here I don't see how it would be possible to add an aesthetic mapping to the lines.

Another option is deciding we won't allow aesthetic mappings for this element in which case we can simplify the aes_by_visual dict and docs as well as all the in hline_aes checks because we do ignore_aes="all"

OriolAbril · 2025-10-29T11:57:06Z

src/arviz_plots/plots/khat_plot.py

+                    **h_kwargs,
+                )
+
+    if show_bins:


Similar idea here.

Example edge case which might help us think if we want to do this or hardcode things to be always off. We can have multidimensional khat dataarray. Take for example the rugby example but instead of defining two observational variables for points home/away we stack that and end up with a (match, field) array for observations, khats, posterior predictive...

If we integrate all the binning with the plotcollection machinery we could do a cols/rows=["field"] to split the plot in two, share the xaxis between the two plots of the figure in order to easily see if the bad khats for the away subset happen at the same matches they do for the home subset. I think in a case like that each plot would have its own independent binning which should not be difficult if using our xarray based histogram along with filter_aes helper to get the dimensions to reduce

jordandeklerk added 4 commits October 20, 2025 21:53

feat: add pareto shape plot

9abb859

refactor: move helpers around and optimize

c4880e3

refactor: fix docstring and hover capability

84354e7

refactor: fix import

c8e492a

jordandeklerk commented Oct 21, 2025

View reviewed changes

docs: add axis helpers to visual docs

b7d19f6

docs: add more tests, gallery, and fix docstring

b7e4d10

jordandeklerk marked this pull request as ready for review October 21, 2025 13:35

jordandeklerk requested review from OriolAbril and aloctavodia October 21, 2025 13:35

fix: remove forced dtypes

4fa3a08

OriolAbril reviewed Oct 24, 2025

View reviewed changes

jordandeklerk added 2 commits October 25, 2025 15:36

refactor: make more arvizian and update coordinate label formatting

3361e5f

refactor: update zorder settings and clean up khat plot logic

706cddb

jordandeklerk commented Oct 26, 2025

View reviewed changes

jordandeklerk mentioned this pull request Oct 28, 2025

Add Pareto khat plot to model comparison chapter arviz-devs/EABM#179

Open

OriolAbril reviewed Oct 29, 2025

View reviewed changes

		return plot_collection


		def format_coords_as_labels(data, skip_dims=None):

		return np.array([f"{s}" for s in coord_labels], dtype=object)


		def calculate_khat_bin_edges(values, thresholds, tolerance=1e-9):

Uh oh!

Add Pareto shape plot #356

Are you sure you want to change the base?

Add Pareto shape plot #356

Uh oh!

Conversation

jordandeklerk commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jordandeklerk commented Oct 24, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OriolAbril left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jordandeklerk commented Oct 21, 2025 •

edited

Loading

codecov-commenter commented Oct 21, 2025 •

edited

Loading