Skip to content

Conversation

@jordandeklerk
Copy link
Member

@jordandeklerk jordandeklerk commented Oct 21, 2025

Closes #353

from arviz_plots import plot_khat
from arviz_base import load_arviz_data
from arviz_stats import loo

dt = load_arviz_data("radon")
elpd_data = loo(dt, pointwise=True)
plot_khat(elpd_data)
image
plot_khat(elpd_data, show_bins=True, show_hlines=True)
image

📚 Documentation preview 📚: https://arviz-plots--356.org.readthedocs.build/en/356/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if the new helpers here should be in the public API or not. Not too sure how frequently these will be used outside of the Pareto shape plot.

@codecov-commenter
Copy link

codecov-commenter commented Oct 21, 2025

Codecov Report

❌ Patch coverage is 66.90141% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.30%. Comparing base (39bf455) to head (706cddb).

Files with missing lines Patch % Lines
src/arviz_plots/plots/utils.py 27.83% 70 Missing ⚠️
src/arviz_plots/plots/khat_plot.py 88.13% 21 Missing ⚠️
src/arviz_plots/visuals/__init__.py 62.50% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #356      +/-   ##
==========================================
- Coverage   86.14%   85.30%   -0.85%     
==========================================
  Files          54       55       +1     
  Lines        6078     6361     +283     
==========================================
+ Hits         5236     5426     +190     
- Misses        842      935      +93     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jordandeklerk jordandeklerk marked this pull request as ready for review October 21, 2025 13:35
Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to leave now, will continue the review tomorrow hopefully it is somewhat helpful already

Comment on lines 295 to 297
if isinstance(color, str) and color in distribution.dims:
pc_kwargs["aes"].setdefault("color", [color])
color = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check other plots to see how we handle similar situations, I don't think we want to use setdefault here as it would mean pc_kwargs takes priority over color argument

@jordandeklerk
Copy link
Member Author

I have to leave now, will continue the review tomorrow hopefully it is somewhat helpful already

Very useful, thank you! I will need to re-work somethings you've mentioned already and some that @aloctavodia mentioned as well over slack. Hoping to get these changes in early next week!

return plot_collection


def format_coords_as_labels(data, skip_dims=None):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same formatting util from legacy ArviZ. Do we need to adpat this in any way for arviz-plots? This seems to work as it is now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we'd take a labeller argument and use sel_to_str (I think it would be the adequate method for this particular case). That will allow users to configure the generated labels a bit (for instance with substitutions, using indexes instead/in addition to coordinate values) and I think the default case will stay the same.

return np.array([f"{s}" for s in coord_labels], dtype=object)


def calculate_khat_bin_edges(values, thresholds, tolerance=1e-9):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this live in arviz-stats since it's computational? Could go here if we need to move it https://github.com/arviz-devs/arviz-stats/blob/main/src/arviz_stats/utils.py.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is ok to leave here unless we are basically using the same thing already to generate the repr of the ELPDData class in which case it would be nice to use the same thing in both places.

def hline(y, target, *, color=unset, alpha=unset, width=unset, linestyle=unset, **artist_kws):
"""Interface to matplotlib for a horizontal line spanning the whole axes."""
artist_kws.setdefault("zorder", 0)
artist_kws.setdefault("zorder", 3)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made this change to place the horizontal lines in front and khat values in the back.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set zorder to 2 in order to define what goes on top of what with the plotting order. We must have missed that zorder when reviewing the addition of hline, we do have a note about it: https://github.com/arviz-devs/arviz-plots/blob/main/src/arviz_plots/backend/matplotlib/core.py#L4. Otherwise this will only work for matplotlib, and you can see in the preview that as the lines are plotted last, bokeh and plotly already render them on top: https://arviz-plots--356.org.readthedocs.build/en/356/gallery/plot_khat.html

As an example of the opposite behaviour, when we add the the alternate row shading in plot_forest we plot the gray stripes first, then all the other elements so the stripes end up at the back in all backends.

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit unsure about the things we'd want to support and the things we don't want to allow. There are several things we could integrate more with plotcollection machinery which would give more flexibility to the plot, but plotcollection also usually gives more flexibility than what is really needed which can end up meaning more work and less clear behaviour.

Comment on lines +307 to +313
default_color = khat_kwargs.get("color", color)

if default_color is None and "color" not in khat_aes:
default_color = "C0"

if "color" not in khat_aes and default_color is not None:
khat_kwargs.setdefault("color", default_color)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
default_color = khat_kwargs.get("color", color)
if default_color is None and "color" not in khat_aes:
default_color = "C0"
if "color" not in khat_aes and default_color is not None:
khat_kwargs.setdefault("color", default_color)
if "color" not in khat_aes:
khat_kwargs.setdefault("color", "C0")

I think this is equivalent with the exception of visuals={"khat": {"color": None}} which I think is ok to let it fail

Comment on lines +36 to +38
show_hlines=False,
show_bins=False,
hover_label=False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not completely sure if we want to keep these as top level arguments or defer to visuals={"xyz": True/False} to activate or deactivate them

Comment on lines +329 to +345
for idx, value in enumerate(hline_values):
h_kwargs = hlines_kwargs.copy()
if "linestyle" not in hlines_aes:
h_kwargs.setdefault("linestyle", f"C{idx}")
if "color" not in hlines_aes:
h_kwargs.setdefault("color", f"C{idx + 1}")
if "alpha" not in hlines_aes:
h_kwargs.setdefault("alpha", 0.7)

h_ds = xr.Dataset({"pareto_k": xr.DataArray(value)})
plot_collection.map(
hline,
f"hline_{idx}",
data=h_ds,
ignore_aes="all",
**h_kwargs,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should define a dataarray/dataset with a bin or hline related dimension (kind of like a concatenated version of the h_ds ones). Then add an overlay aesthetic to it. Otherwise if we manually loop here I don't see how it would be possible to add an aesthetic mapping to the lines.

Another option is deciding we won't allow aesthetic mappings for this element in which case we can simplify the aes_by_visual dict and docs as well as all the in hline_aes checks because we do ignore_aes="all"

**h_kwargs,
)

if show_bins:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar idea here.

Example edge case which might help us think if we want to do this or hardcode things to be always off. We can have multidimensional khat dataarray. Take for example the rugby example but instead of defining two observational variables for points home/away we stack that and end up with a (match, field) array for observations, khats, posterior predictive...

If we integrate all the binning with the plotcollection machinery we could do a cols/rows=["field"] to split the plot in two, share the xaxis between the two plots of the figure in order to easily see if the bad khats for the away subset happen at the same matches they do for the home subset. I think in a case like that each plot would have its own independent binning which should not be difficult if using our xarray based histogram along with filter_aes helper to get the dimensions to reduce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Pareto shape plot

3 participants