Skip to content
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
9c1f811
format for joss
aloctavodia Dec 20, 2025
f324c68
add figures folder
aloctavodia Dec 22, 2025
35907c5
add mention to all contributors, trim trailing white space
aloctavodia Jan 5, 2026
47f580c
d
aloctavodia Jan 5, 2026
a254820
fix trailing whitespace
aloctavodia Jan 5, 2026
e8bccaa
trim trailing whitespace
aloctavodia Jan 5, 2026
2c05a1b
Minor edits
avehtari Jan 8, 2026
e75d34a
Update Paananen reference to 2021 version
avehtari Jan 8, 2026
3e2743d
Update Paanen reference year
avehtari Jan 8, 2026
0a40a9f
update acknowledgements
aloctavodia Jan 8, 2026
d568622
Apply suggestions from code review
aloctavodia Jan 8, 2026
497b0ee
Apply suggestions from code review
aloctavodia Jan 9, 2026
a08a693
add comment about dimension order
aloctavodia Jan 9, 2026
752f7c4
add comments
aloctavodia Jan 13, 2026
d795f26
add gha for compiling paper
aloctavodia Jan 13, 2026
abe207e
add missing sections, fix missing DOIs
aloctavodia Jan 17, 2026
c3761ee
update AI usage disclosure and move to the bottom
aloctavodia Jan 19, 2026
f3b14ba
remove prefix from DOIs
aloctavodia Jan 23, 2026
8c7ce78
Apply suggestions from code review
aloctavodia Jan 24, 2026
4e27959
update stan reference
aloctavodia Jan 24, 2026
88449c8
fix urls
aloctavodia Jan 24, 2026
1be8f4f
address reviewer comments
aloctavodia Jan 28, 2026
9975e80
tweaks related to review comments
OriolAbril Feb 3, 2026
b818746
update example to use qds, add missing references, small tweaks
aloctavodia Feb 3, 2026
4962735
move text from caption to body
aloctavodia Feb 3, 2026
bda05d1
make statement of need more concrete
aloctavodia Feb 3, 2026
378f772
Edits to JOSS paper (#2537)
matt-graham Feb 24, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added paper/figures/figure_0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
124 changes: 124 additions & 0 deletions paper/paper.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
---
title: 'ArviZ: a modular and flexible library for exploratory analysis of Bayesian models'
tags:
- Python
- Bayesian statistics
- Bayesian workflow
authors:
- name: Osvaldo A Martin
orcid: 0000-0001-7419-8978
equal-contrib: true
corresponding: true
affiliation: 1
- name: Oriol Abril-Pla
orcid: 0000-0002-1847-9481
equal-contrib: true
corresponding: true
affiliation: 2
- name: Jordan Deklerk
affiliation: 3
- name: Seth D. Axen
orcid: 0000-0003-3933-8247
affiliation: 4
- name: Colin Carroll
orcid: 0000-0001-6977-0861
affiliation: 2
- name: Ari Hartikainen
orcid: 0000-0002-4569-569X
affiliation: 2
- name: Aki Vehtari
orcid: 0000-0003-2164-9469
affiliation: 5,1
affiliations:
- name: Aalto University, Espoo, Finland
index: 1
- name: arviz-devs
index: 2
- name: DICK's Sporting Goods, Coraopolis, Pennsylvania
index: 3
- name: University of Tübingen
index: 4
- name: ELLIS Institute Finland
index: 5
date: 22 December 2025
bibliography: references.bib
---

# Summary

`ArviZ` [@Kumar_2019] is a Python package for exploratory analysis of Bayesian models that has been widely used in academia and industry since its introduction in 2019, with over 700 citations and 75 million downloads. Its goal is to integrate seamlessly with established probabilistic programming languages and statistical interfaces, such as PyMC [@Abril-pla_2023], Stan (via the cmdstanpy interface) [@stan], Pyro, NumPyro [@Phan_2019; @Bingham_2019], emcee [@emcee], and Bambi [@Capretto_2022], among others.

`ArviZ` is part of the broader ArviZ project, which develops tools for exploratory analysis of Bayesian models. The organization also maintains other initiatives, including ArviZ.jl (for Julia), PreliZ [@icazatti_2023], educational resources [@eabm_2025], and additional packages that are still in an experimental phase.

In this work, we present a redesigned version of `ArviZ` that emphasizes greater user control and modularity. This redesign delivers a more flexible and efficient toolkit for exploratory analysis of Bayesian models. With its renewed focus on modularity and usability, `ArviZ` is well-positioned to remain an essential tool for Bayesian modelers in both research and applied settings.

# Statement of need

Probabilistic programming has emerged as a powerful paradigm for statistical modeling, accompanied by a growing ecosystem of tools for model specification and inference. Effective modeling requires robust support for sampling diagnostics, model comparison, and model checking [@Gelman_2020; @Martin_2024; @Guo_2024]. `ArviZ` addresses this gap by providing a unified, backend-agnostic library to perform these tasks. The original `ArviZ` paper [@Kumar_2019] describes the landscape of probabilistic programming tools at the time and the need for a unified, backend-agnostic library for exploratory analysis - a need that has only grown as the ecosystem has expanded.

The methods implemented in `ArviZ` are grounded in well-established statistical principles and provide robust, interpretable diagnostics and visualizations [@Vehtari_2017; @Gelman_2019; @Paananen_2021; @Vehtari_2021; @Dimitriadis_2021; @Sailynoja_2022; @Kallioinen_2023; @Sailynoja_2025]. The redesigned version furthers these goals by introducing an easier-to-use interface for regular users and more powerful tooling for power users and developers of Bayesian tools. These updates align with recent developments in the probabilistic programming field. Additionally, the new design facilitates the use of components as modular building blocks for custom analyses. This frequent user request was difficult to accommodate under the old framework.

# Description

We present a redesigned version of `ArviZ` emphasizing greater user control and modularity. The new architecture enables users to customize the installation and use of specific components. The previous `ArviZ` design divided the package into three submodules, which are now available as three independent installable packages with improved design as described next.

General functionality, data processing, and data input/output have been streamlined and enhanced for greater versatility. Previously, `ArviZ` used the custom `InferenceData` class to organize and store the high-dimensional outputs of Bayesian inference in a structured, labeled format, enabling efficient analysis, metadata persistence, and serialization. These have been replaced with the `DataTree` class from xarray [@Hoyer_2017]. Additionally, converters allow more flexibility in dimensionality, naming, and indexing of their generated outputs.

Statistical functions are now accessible through two distinct interfaces:

* A low-level array interface with minimal dependencies, intended for advanced users
and developers of third-party libraries.
* A higher-level xarray interface designed for end users, which simplifies usage by automating common tasks and handling metadata.

Plotting functions have also been redesigned to support modularity at multiple levels:

* At a high level, `ArviZ` offers a collection of “batteries-included” plots. These are built-in plotting functions providing sensible defaults for common tasks like MCMC sampling diagnostics, predictive checks, and model comparison.
* At an intermediate level, the API enables easier customization of batteries-included plots and simplifies the creation of new plots. This is achieved through the `PlotCollection` class, which enables developers and advanced users to focus solely on the plotting logic, without needing to handle faceting or aesthetics.
* At a lower level, we have improved the separation between computational and plotting logic, reducing code duplication and enhancing modular design. These changes also facilitate support for multiple plotting backends, improving extensibility and maintainability. Currently, `ArviZ` supports three plotting backends: matplotlib [@Hunter_2007], Bokeh [@Bokeh_2018], and plotly [@plotly_2015].


## Examples

For the first example, we construct an array resembling data from MCMC sampling. We have 4 chains and 1000 draws for two posterior variables. We can compute the effective sample sizes for this array using the stats interface. For this, we need to specify which axes represent the chains and which the draws.

import numpy as np
from arviz import array_stats

rng = np.random.default_rng()
samples = rng.normal(size=(4, 1000, 2)) # (chain, draw, variable)
array_stats.ess(samples, chain_axis=0, draw_axis=1)

We now contrast the array interface with the xarray interface. When converting the NumPy array to a DataTree, ArviZ assigns `chain` and `draw` as named dimensions based on the assumed dimension order, so this information is already encoded in the resulting object and does not need to be specified explicitly when calling other functions.

import arviz as az
dt_samples = az.convert_to_datatree(samples)
az.ess(dt_samples)

The only required argument for battery-included plots is the input data, typically a `DataTree` (`dt`), but in the following example we also apply optional customizations.

az.style.use('arviz-variat')
dt = az.load_arviz_data("centered_eight")
pc = az.plot_dist(
dt,
kind="hist",
visuals={"hist":{"alpha": 0.3}},
aes={"color": ["school"]}
);
pc.add_legend("school", loc="outside right upper")

![plot_dist with color mapped to school dimension.](figures/figure_0.png "`plot_dist` is a built-in plot. Here we show an example of further customization. The color is mapped to the school dimension. A neutral color is automatically assigned to the variables without the school dimension (mu and tau). The histograms have been made translucent"){width=4.5in}

We have shown two small examples. For a more comprehensive overview, see the [`ArviZ` documentation](https://python.arviz.org/en/latest/) and the [EABM guide](https://arviz-devs.github.io/EABM/) [@eabm_2025]. These resources include a wide range of examples designed for all types of users, from casual users to advanced analysts and developers looking to use `ArviZ` in their projects or libraries.

## Acknowledgements

We thank our fiscal sponsor, NumFOCUS, a nonprofit 501(c)(3) public charity, for their operational and financial support. We also thank all the contributors to `arviz`, `arviz-base`, `arviz-stats`, and `arviz-plots` repositories, including code contributors, documentation writers, issue reporters, and users who have provided feedback and suggestions.

This research was supported by:

* The Research Council of Finland Flagship Program "Finnish Center for Artificial Intelligence" (FCAI)
* Research Council of Finland grant 340721
* Essential Open Source Software Round 4 grant by the Chan Zuckerberg Initiative (CZI)
* Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC number 2064/1 – Project number 390727645

# References
261 changes: 261 additions & 0 deletions paper/references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
@article{Kumar_2019,
doi = {10.21105/joss.01143},
url = {https://doi.org/10.21105/joss.01143},
year = {2019}, publisher = {The Open Journal},
volume = {4},
number = {33},
pages = {1143},
author = {Ravin Kumar and Colin Carroll and Ari Hartikainen and Osvaldo Martin},
title = {ArviZ a unified library for exploratory analysis of Bayesian models in Python},
journal = {Journal of Open Source Software}
}

@article{Abril-pla_2023,
title = {{PyMC}: a modern, and comprehensive probabilistic programming framework in {Python}},
volume = {9},
issn = {2376-5992},
shorttitle = {{PyMC}},
url = {https://peerj.com/articles/cs-1516},
doi = {10.7717/peerj-cs.1516},
language = {en},
urldate = {2023-10-26},
journal = {PeerJ Computer Science},
author = {Abril-Pla, Oriol and Andreani, Virgile and Carroll, Colin and Dong, Larry and Fonnesbeck, Christopher J. and Kochurov, Maxim and Kumar, Ravin and Lao, Junpeng and Luhmann, Christian C. and Martin, Osvaldo A. and Osthege, Michael and Vieira, Ricardo and Wiecki, Thomas and Zinkov, Robert},
month = sep,
year = {2023},
note = {Publisher: PeerJ Inc.},
pages = {e1516},
}

@article{stan,
title = {Stan: {{A Probabilistic Programming Language}} | {{Carpenter}} | {{Journal}} of {{Statistical Software}}},
shorttitle = {{}},
doi = {10.18637/jss.v076.i01},
language = {en-US},
keywords = {Bayesian inference,algorithmic differentiation,probabilistic programming,Stan},
}

@article{Phan_2019,
title={Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro},
author={Phan, Du and Pradhan, Neeraj and Jankowiak, Martin},
journal={arXiv preprint arXiv:1912.11554},
year={2019}
}

@article{Bingham_2019,
author = {Eli Bingham and
Jonathan P. Chen and
Martin Jankowiak and
Fritz Obermeyer and
Neeraj Pradhan and
Theofanis Karaletsos and
Rohit Singh and
Paul A. Szerlip and
Paul Horsfall and
Noah D. Goodman},
title = {Pyro: Deep Universal Probabilistic Programming},
journal = {J. Mach. Learn. Res.},
volume = {20},
pages = {28:1--28:6},
year = {2019},
url = {http://jmlr.org/papers/v20/18-403.html}
}

@misc{Gelman_2020,
title={Bayesian Workflow},
author={Andrew Gelman and Aki Vehtari and Daniel Simpson and Charles C. Margossian and Bob Carpenter and Yuling Yao and Lauren Kennedy and Jonah Gabry and Paul-Christian Bürkner and Martin Modrák},
year={2020},
eprint={2011.01808},
archivePrefix={arXiv},
doi={10.48550/arXiv.2011.01808},
primaryClass={stat.ME}
}

@article{Sailynoja_2025,
title={Recommendations for visual predictive checks in Bayesian workflow},
author={S{\"a}ilynoja, Teemu and Johnson, Andrew R and Martin, Osvaldo A and Vehtari, Aki},
journal={arXiv:2503.01509},
doi={10.48550/arXiv.2503.01509},
year={2025},
}

@article{Vehtari_2017,
title={Practical {Bayesian} model evaluation using leave-one-out cross-validation and {WAIC}},
author={Vehtari, Aki and Gelman, Andrew and Gabry, Jonah},
journal={Stat Comp},
doi={10.1007/s11222-016-9696-4},
volume={27},
pages={1413--1432},
year={2017},
}

@article{Vehtari_2021,
title={Rank-normalization, folding, and localization: An improved $\widehat{R}$ for assessing convergence of {MCMC}},
author={Vehtari, Aki and Gelman, Andrew and Simpson, Daniel and Carpenter, Bob and B{\"u}rkner, Paul Christian},
journal={Bayes Anal},
year={2021},
volume={16},
doi={10.1214/20-BA1221},
pages={667--718}
}

@article{Sailynoja_2022,
title = {Graphical test for discrete uniformity and its applications in goodness-of-fit evaluation and multiple sample comparison},
volume = {32},
pages = {1573--1375},
journal = {Stat Comp},
author = {Säilynoja, Teemu and Bürkner, Paul Christian and Vehtari, Aki},
doi = {10.1007/s11222-022-10090-6},
year = {2022}
}

@article{Kallioinen_2023,
title = {Detecting and diagnosing prior and likelihood sensitivity with power-scaling},
author = {Noa Kallioinen and Topi Paananen and Paul Christian Bürkner and Aki Vehtari},
year = {2023},
journal = {Stat Comp},
volume = {34},
issue = {57},
doi = {10.1007/s11222-023-10366-5},
encoding = {UTF-8},
}

@article{Paananen_2021,
author = {Paananen, T. and Piironen, J. and Bürkner, P. C. and Vehtari, A.},
year = 2021,
title = {Implicitly adaptive importance sampling},
journal = {Stat Comp},
volume = 31,
number = 16
}

@article{Gelman_2019,
author = {Andrew Gelman and Ben Goodrich and Jonah Gabry and Aki Vehtari},
title = {R-squared for {Bayesian} regression models},
journal = {Am Stat},
doi={10.1080/00031305.2018.1549100},
volume = {73},
number = {3},
pages = {307-309},
year = {2019}
}

@article{Dimitriadis_2021,
title = {Stable reliability diagrams for probabilistic classifiers},
volume = {118},
issn = {0027-8424, 1091-6490},
url = {https://pnas.org/doi/full/10.1073/pnas.2016191118},
doi = {10.1073/pnas.2016191118},
language = {en},
number = {8},
urldate = {2023-04-12},
journal = {Proceedings of the National Academy of Sciences},
author = {Dimitriadis, Timo and Gneiting, Tilmann and Jordan, Alexander I.},
month = feb,
year = {2021},
pages = {e2016191118},
}

@article{Capretto_2022,
title={Bambi: A Simple Interface for Fitting Bayesian Linear Models in Python},
volume={103},
number={15},
journal={Journal of Statistical Software},
author={Capretto, Tomás and Piho, Camen and Kumar, Ravin and Westfall, Jacob and Yarkoni, Tal and Martin, Osvaldo A},
year={2022},
pages={1–29}
}

@article{Hoyer_2017,
title = {xarray: {N-D} labeled arrays and datasets in {Python}},
author = {Hoyer, S. and J. Hamman},
journal = {Journal of Open Research Software},
volume = {5},
number = {1},
year = {2017},
publisher = {Ubiquity Press},
doi = {10.5334/jors.148},
url = {https://doi.org/10.5334/jors.148}
}

@article{Hunter_2007,
Author = {Hunter, J. D.},
Title = {Matplotlib: A 2D graphics environment},
Journal = {Computing in Science \& Engineering},
Volume = {9},
Number = {3},
Pages = {90--95},
abstract = {Matplotlib is a 2D graphics package used for Python for
application development, interactive scripting, and publication-quality
image generation across user interfaces and operating systems.},
publisher = {IEEE COMPUTER SOC},
doi = {10.1109/MCSE.2007.55},
year = 2007
}

@manual{Bokeh_2018,
title = {Bokeh: Python library for interactive visualization},
author = {{Bokeh Development Team}},
year = {2018},
url = {https://bokeh.pydata.org/en/latest/},
}

@online{plotly_2015,
author = {Plotly Technologies Inc.},
title = {Collaborative data science},
publisher = {Plotly Technologies Inc.},
address = {Montreal, QC},
year = {2015},
url = {https://plot.ly},
}

@misc{Guo_2024,
title={VMC: A Grammar for Visualizing Statistical Model Checks},
author={Ziyang Guo and Alex Kale and Matthew Kay and Jessica Hullman},
year={2024},
eprint={2408.16702},
archivePrefix={arXiv},
primaryClass={cs.HC},
url={https://arxiv.org/abs/2408.16702},
}

@book{Martin_2024,
title = {Bayesian {Analysis} with {Python}: {A} {Practical} {Guide} to probabilistic modeling, 3rd {Edition}},
isbn = {978-1-80512-716-1},
shorttitle = {Bayesian {Analysis} with {Python}},
language = {English},
publisher = {Packt Publishing},
author = {Martin, Osvaldo A},
month = feb,
year = {2024},
}


@article{emcee, doi = {10.21105/joss.01864}, url = {https://doi.org/10.21105/joss.01864}, year = {2019}, publisher = {The Open Journal}, volume = {4}, number = {43}, pages = {1864}, author = {Daniel Foreman-Mackey and Will M. Farr and Manodeep Sinha and Anne M. Archibald and David W. Hogg and Jeremy S. Sanders and Joe Zuntz and Peter K. g. Williams and Andrew R. j. Nelson and Miguel de Val-Borro and Tobias Erhardt and Ilya Pashchenko and Oriol Abril Pla}, title = {emcee v3: A Python ensemble sampling toolkit for affine-invariant MCMC}, journal = {Journal of Open Source Software} }


@book{eabm_2025,
author = {Osvaldo A Martin and Oriol Abril-Pla and Jordan Deklerk},
title = {Exploratory analysis of Bayesian models},
month = nov,
year = 2025,
publisher = {Zenodo},
version = {v0.3.0},
doi = {10.5281/zenodo.15127548},
url = {https://doi.org/10.5281/zenodo.15127548},
},



@article{icazatti_2023,
author = {Icazatti, Alejandro and Abril-Pla, Oriol and Klami, Arto and Martin, Osvaldo A},
doi = {10.21105/joss.05499},
journal = {Journal of Open Source Software},
month = sep,
number = {89},
pages = {5499},
title = {{PreliZ: A tool-box for prior elicitation}},
url = {https://joss.theoj.org/papers/10.21105/joss.05499},
volume = {8},
year = {2023}
}