You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: paper/paper.md
+11-5Lines changed: 11 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,6 +46,8 @@ bibliography: references.bib
46
46
47
47
# Summary
48
48
49
+
When working with Bayesian models, a range of related tasks must be addressed beyond inference itself. These include tasks such as diagnosing the quality of MCMC samples, model criticism and model comparison. We collectively refer to these activities as exploratory analysis of Bayesian models.
50
+
49
51
In this work, we present a redesigned version of `ArviZ`, a Python package for exploratory analysis of Bayesian models. The redesign emphasizes greater user control and modularity. This redesign delivers a more flexible and efficient toolkit for exploratory analysis of Bayesian models. With its renewed focus on modularity and usability, `ArviZ` is well-positioned to remain an essential tool for Bayesian modelers in both research and applied settings.
50
52
51
53
# Statement of need
@@ -54,17 +56,21 @@ Probabilistic programming has emerged as a powerful paradigm for statistical mod
54
56
55
57
The methods implemented in `ArviZ` are grounded in well-established statistical principles and provide robust, interpretable diagnostics and visualizations [@Vehtari_2017; @Gelman_2019; @Paananen_2021; @Vehtari_2021; @Dimitriadis_2021; @Sailynoja_2022; @Kallioinen_2023; @Sailynoja_2025]. The redesigned version furthers these goals by introducing an easier-to-use interface for regular users and more powerful tooling for power users and developers of Bayesian tools. These updates align with recent developments in the probabilistic programming field. Additionally, the new design facilitates the use of components as modular building blocks for custom analyses. This frequent user request was difficult to accommodate under the old framework.
56
58
59
+
# State of the field
60
+
61
+
In the Python Bayesian ecosystem, ArviZ occupies a niche comparable to tools in the R/Stan community such as posterior [@gelman_2013;@Vehtari_2021], loo [@Vehtari_2017;@loo], bayesplot [@bayesplot0;@bayesplot1], and priorsense [@Kallioinen_2023], sharing similar goals while reflecting different language ecosystems and workflows.
62
+
57
63
# Research Impact Statement
58
64
59
65
`ArviZ`[@Kumar_2019] is a Python package for exploratory analysis of Bayesian models that has been widely used in academia and industry since its introduction in 2019, with over 700 citations and 75 million downloads. Its goal is to integrate seamlessly with established probabilistic programming languages and statistical interfaces, such as PyMC [@Abril-pla_2023], Stan (via the cmdstanpy interface) [@stan], Pyro, NumPyro [@Phan_2019; @Bingham_2019], emcee [@emcee], and Bambi [@Capretto_2022], among others.
60
66
61
-
`ArviZ` is part of the broader ArviZ project, which develops tools for exploratory analysis of Bayesian models. The organization also maintains other initiatives, including ArviZ.jl [@arvizjl_2025] (for Julia), PreliZ [@icazatti_2023], educational resources [@eabm_2025], and additional packages that are still in an experimental phase.
67
+
The maturity of `ArviZ` has also led to other initiatives such including ArviZ.jl [@arvizjl_2025] (for Julia), PreliZ [@icazatti_2023] and the development of educational resources [@eabm_2025].
62
68
63
69
# Software design
64
70
65
-
We present a redesigned version of `ArviZ` emphasizing greater user control and modularity. The new architecture enables users to customize the installation and use of specific components. The previous `ArviZ`design divided the package into three submodules, which are now available as three independent installable packages with improved design as described next.
71
+
The previous `ArviZ` design divided the package into three submodules, which are now available as three independent installable packages this redesign emphasizes greater user control and modularity. The new architecture enables users to customize the installation and use of specific components. Key design changes include:
66
72
67
-
General functionality, data processing, and data input/output have been streamlined and enhanced for greater versatility. Previously, `ArviZ` used the custom `InferenceData` class to organize and store the high-dimensional outputs of Bayesian inference in a structured, labeled format, enabling efficient analysis, metadata persistence, and serialization. These have been replaced with the `DataTree` class from xarray [@Hoyer_2017]. Additionally, converters allow more flexibility in dimensionality, naming, and indexing of their generated outputs.
73
+
General functionality, data processing, and data input/output have been streamlined and enhanced for greater versatility. Previously, `ArviZ` used the custom `InferenceData` class to organize and store the high-dimensional outputs of Bayesian inference in a structured, labeled format, enabling efficient analysis, metadata persistence, and serialization. These have been replaced with the `DataTree` class from xarray [@Hoyer_2017], which, like the original `InferenceData`, supports grouping but is more flexible, enabling richer nesting and automatic support for all xarray I/O formats. Additionally, converters allow more flexibility in dimensionality, naming, and indexing of their generated outputs.
68
74
69
75
Statistical functions are now accessible through two distinct interfaces:
70
76
@@ -81,7 +87,7 @@ Plotting functions have also been redesigned to support modularity at multiple l
81
87
82
88
## Examples
83
89
84
-
For the first example, we construct an array resembling data from MCMC sampling. We have 4 chains and 1000 draws for two posterior variables. We can compute the effective sample sizes for this array using the stats interface. For this, we need to specify which axes represent the chains and which the draws.
90
+
For the first example, we use the low-level array interface. We construct an array resembling data from MCMC sampling. We have 4 chains and 1000 draws for two posterior variables. We can compute the effective sample sizes for this array using the stats interface. For this, we need to specify which axes represent the chains and which the draws.
85
91
86
92
import numpy as np
87
93
from arviz import array_stats
@@ -90,7 +96,7 @@ For the first example, we construct an array resembling data from MCMC sampling.
We now contrast the array interface with the xarray interface. When converting the NumPy array to a `DataTree`, ArviZ assigns `chain` and `draw` as named dimensions based on the assumed dimension order, so this information is already encoded in the resulting object and does not need to be specified explicitly when calling other functions.
99
+
We now contrast the low-level array interface with the xarray interface. When converting the NumPy array to a `DataTree`, ArviZ assigns `chain` and `draw` as named dimensions based on the assumed dimension order, so this information is already encoded in the resulting object and does not need to be specified explicitly when calling other functions.
0 commit comments