Skip to content

Add simulator function to SBC#44

Merged
aloctavodia merged 8 commits intoarviz-devs:mainfrom
currocam:main
Sep 4, 2025
Merged

Add simulator function to SBC#44
aloctavodia merged 8 commits intoarviz-devs:mainfrom
currocam:main

Conversation

@currocam
Copy link
Contributor

Description

This PR adds an optional argument simulator to provide a more flexible user interface by allowing SBC to be used with (1) models with no observed variables and (2) custom simulators that do not match the probabilistic model. We discussed this possibility at #4 . It’s my first time using numpyro and contributing to some PyMC ecosystem library, so I’m happy to learn.

I haven't updated the documentation, but test cases include new functionality, and I've written the following toy example.

Toy example for simulator

Let's say we are interested in modeling the annual dispersal of a species across a (1-dimensional) landscape. For example, we might want to estimate the average dispersal (sigma) from a dataset of start and end coordinates.

from arviz_plots import plot_ecdf_pit
import pymc as pm
import numpy as np
import simuk
start_points = np.asarray([0., -5., 10., 15., 30.])
end_points = np.asarray([-10., 15., 20., 25., 20.])
displacement = end_points - start_points
# A very simple model could be
with pm.Model() as model:
    sigma = pm.HalfNormal("sigma", sigma=5)
    y_obs = pm.Normal("y", mu=0, sigma=sigma, observed=displacement)

The custom simulator function is useful when (a) having a model that does not have a built-in simulator (because it was a custom Potential call, for example) or (b) when we want to simulate data from a model that is not a probabilistic model.

Let's say we have a mechanistic model that describes the dispersal process. Starting and ending points were taken with a 1-year interval. Every day, an individual moves a distance drawn from an unknown distribution with finite variance. According to the central limit theorem, the total sum of the displacements (if independent) should follow a normal distribution with mean 0 and standard deviation sqrt(time) * sigma as time increases. We might wonder if 1 year is enough time for this approximation to be accurate, and we can test this using SMC for different dispersal distributions with a custom simulator function.

def simulator_uniform(seed, sigma):
    rng = np.random.default_rng(seed)
    time_in_days, n_obs = 365, 5
    a = -np.sqrt(3 * sigma**2 / time_in_days)
    b =  np.sqrt(3 * sigma**2 / time_in_days)
    steps = rng.uniform(a, b, size=(time_in_days, n_obs))
    return {'y': steps.sum(axis=0)}

sbc = simuk.SBC(model, num_simulations=100, simulator=simulator_uniform)
sbc.run_simulations()
plot_ecdf_pit(sbc.simulations)

Copy link
Contributor

@aloctavodia aloctavodia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution, and sorry for taking so much time to answer. I added a few comments that help simplify the code (assuming they work as intended).

@aloctavodia
Copy link
Contributor

I will check who is using numba. We are using it in a couple of places in arviz, but it should be optional there.

We should have a simple example on how to use a simulator. But that can be done in a separated PR. We could also have examples here https://arviz-devs.github.io/EABM/Chapters/Simulation_based_calibration.html

@aloctavodia aloctavodia changed the title [WIP] Add simulator function to SBC Add simulator function to SBC Sep 3, 2025
@currocam
Copy link
Contributor Author

currocam commented Sep 3, 2025

Bambi dependency issue

The test for bambi fails in the current version fails with bambi==0.13.0 (in CI) but passes with 0.15.0 (in my machine).
However, I'm not really sure what's going on under the hood.

Current error at 0.13.0 says:

ValueError: Error generating prior predictive sample with parameters {'x': array(-2.04538266), 'Intercept': array(0.45027041), 'y_sigma': array(1.25691733), 'seed': np.int64(745490419)}: bmb_simulator() missing 2 required positional arguments: 'mu' and 'sigma'.

This happens because with bambi==0.13.0 the PyMC model has variables named as {'x', ‘Intercept', 'y_sigma'} (and not mu and sigma, as the test expects).

With bambi==0.15.0 the PyMC model has variables named as
{'x', ‘Intercept', 'sigma', 'mu', 'x'}

I think I definitely misunderstood the Bambi model when writing the text. I guess x is the coefficient corresponding to the 'x' variable and not mu as I thought initially?

Do you have any input on this? I've never used Bambi (and I see you're a maintainer). Is there a way we can provide a better interface? I can imagine this being very confusing to use (at least, more confusing than the vanilla pymc version).

@aloctavodia
Copy link
Contributor

yes, x is the coefficient for the x variable. There has been a change in how auxiliary variables are named. It used to be nameresponse_nameparameter (like "y_sigma"), now it's just "nameparameter", like sigma. I think it's ok to ask for Bambi >= 0.14. Actually there are reasons to set the lower bound at 0.16 once it gets released.

@currocam
Copy link
Contributor Author

currocam commented Sep 4, 2025

CI now fails with

ERROR: Ignored the following versions that require a different python version: 0.14.0 Requires-Python >=3.10,<3.13; 0.15.0 Requires-Python >=3.10,<3.13

Perhaps most pragmatic decision would be to simply skip the test?

@aloctavodia
Copy link
Contributor

Sounds good to me. It will work with the upcoming release of bambi.

@codecov-commenter
Copy link

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@aloctavodia aloctavodia merged commit c3c9052 into arviz-devs:main Sep 4, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants