Refactor plot_lm #343

aloctavodia · 2025-09-11T10:00:31Z

I started working on adding a smooth option to plot_lm as presented in #317. But after playing a little bit with the function, I decided to incorporate other changes, which may be a better fit given the design and expectations of other plots in ArviZ. But given that plot_lm has always been a weird plot for us, and I may be too focused on what I want to do, and then miss other uses. So, to make things more concrete, I will show the steps of the two main use cases I have in mind. Notice the combine function I am adding in this PR; I may be overcomplicating things there, or not making the correct assumptions (~~e.g., it fails when ci_prob is a list instead of a float~~).

A common pattern in ArviZ-plots is to create a DataTree first, plot second. So I follow that pattern here. For instance, the arguments "x" and "y" are equivalent to "var_names" in other plots.

For the most straightforward example, we have a linear regression with only one predictor. Here I am using the bikes dataset and a NegativeBinomial family, the model is essentially.

rented ~ temperature

Once the model is sampled, we do

idata_lb.add_groups({"constant_data": xr.DataArray(bikes.temperature)})

And then we can plot the predictions with

azp.plot_lm(idata_lb)

Multiple bands

pc = azp.plot_lm(idata_lb,
                 ci_prob=[0.5, 0.9, 0.95],
                 visuals={"observed_scatter":False,
                          "pe_line": False})

Notice the y-axis labels are not correct yet.

More often than not. We have regression models with more than one covariate. For those cases, what we want to plot is the marginal predictions (or marginals eta/linear-terms).

rented ~ temperature + humidity

For such models, we need to compute the marginals somehow. PyMC-BART has custom functions for this that take advantage of the tree structure to efficiently compute predictions. Bambi has the interpret module for the slope/predictions. I mention those for context, but also because, at least in the PyMC-BART case, it would be nice to replace the plotting part with arviz-plots.

Assuming we want to compute the marginals manually, we could do something like this

import xarray as xr
from xarray_einstats.stats import XrDiscreteRV
import preliz as pz

# Compute marginal "slopes"
alpha = idata_lb.posterior["α"]
beta = idata_lb.posterior["β"]
scale = idata_lb.posterior["scale"]


mut = np.exp(alpha + beta.sel(β_dim_0=0) * xr.DataArray(bikes.temperature, dims="μ_t_dim_0") + beta.sel(β_dim_0=1) * bikes.humidity.mean())
muh = np.exp(alpha + beta.sel(β_dim_0=0) * bikes.temperature.mean() + beta.sel(β_dim_0=1) * xr.DataArray(bikes.humidity, dims="μ_h_dim_0"))

idata_lb.posterior["μ_t"] = mut
idata_lb.posterior["μ_h"] = muh

# Compute marginal predictions
y_pred_t = XrDiscreteRV(pz.NegativeBinomial, mut, scale).rvs()
y_pred_h = XrDiscreteRV(pz.NegativeBinomial, muh, scale).rvs()

idata_lb.posterior_predictive["y_pred_t"] = y_pred_t
idata_lb.posterior_predictive["y_pred_h"] = y_pred_h

dt_cd = azb.from_dict(
    {"constant_data": {
        "temperature": bikes.temperature.to_numpy(),
        "humidity": bikes.humidity.to_numpy(),
    }},
    dims={"temperature": ["dim_0"], "humidity": ["dim_0"]},
)
idata_lb["constant_data"] = dt_cd["constant_data"].to_dataset()

After that, we can plot the marginal values i.e., how the mean of the rented bikes changes with temperature when we keep the humidity at its mean value. and the other way around

pc = azp.plot_lm(idata_lb,
                x=["temperature", "humidity"],
                y=["y_pred_t", "y_pred_h"],
                y_obs="y_pred",
                visuals={"ci_band":{},  "pe_line":{}, "observed_scatter":False},
                )

We may want to plot the posterior means, instead of predictions

pc = azp.plot_lm(idata_lb,
                group="posterior",
                x=["temperature", "humidity"],
                y=["μ_t", "μ_h"],
                y_obs="y_pred",
                visuals={"ci_bounds":True, "pe_line":{}, "observed_scatter":False},
                )

📚 Documentation preview 📚: https://arviz-plots--343.org.readthedocs.build/en/343/

codecov-commenter · 2025-09-11T10:11:11Z

Codecov Report

❌ Patch coverage is 3.47826% with 111 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.29%. Comparing base (5dd3aef) to head (ad0c50e).

Files with missing lines	Patch %	Lines
src/arviz_plots/plots/lm_plot.py	3.47%	111 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #343      +/-   ##
==========================================
- Coverage   85.94%   85.29%   -0.66%     
==========================================
  Files          48       48              
  Lines        6014     6060      +46     
==========================================
  Hits         5169     5169              
- Misses        845      891      +46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

juanitorduz · 2025-09-17T08:29:23Z

This looks awesome!

So this is doing almost the same as the old plot_hdi? ~~Do you want to include the smoothing option?~~ Ok, you did :D

aloctavodia · 2025-09-17T09:30:09Z

It is doing similar things. Maybe we don't need to have a separate function like plot_hdi if this is easy to use and flexible enough.

juanitorduz · 2025-09-17T09:31:19Z

Agree! This looks great! Thank you!

OriolAbril

I like most of the changes but we need more examples both to showcase that in the docs and for user testing.