Skip to content

Commit aaa934e

Browse files
authored
Merge branch 'main' into statistical_rethinking
2 parents 0b56e8e + b239132 commit aaa934e

24 files changed

+1627
-7980
lines changed

examples/bart/bart_categorical_hawks.ipynb

Lines changed: 113 additions & 86 deletions
Large diffs are not rendered by default.

examples/bart/bart_categorical_hawks.myst.md

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ myst:
1414
pip_dependencies: pymc-bart
1515
---
1616

17-
+++ {"editable": true, "slideshow": {"slide_type": ""}}
17+
+++ {"slideshow": {"slide_type": ""}}
1818

1919
(bart_categorical)=
2020
# Categorical regression
@@ -136,11 +136,11 @@ It may be that some of the input variables are not informative for classifying b
136136

137137
```{code-cell} ipython3
138138
---
139-
editable: true
140139
slideshow:
141140
slide_type: ''
142141
---
143-
pmb.plot_variable_importance(idata, μ, x_0, method="VI", random_seed=RANDOM_SEED);
142+
vi_results = pmb.compute_variable_importance(idata, μ, x_0, method="VI", random_seed=RANDOM_SEED)
143+
pmb.plot_variable_importance(vi_results);
144144
```
145145

146146
It can be observed that with the covariables `Hallux`, `Culmen`, and `Wing` we achieve the same R$^2$ value that we obtained with all the covariables, this is that the last two covariables contribute less than the other three to the classification. One thing we have to take into account in this is that the HDI is quite wide, which gives us less precision on the results, later we are going to see a way to reduce this.
@@ -152,7 +152,7 @@ It can be observed that with the covariables `Hallux`, `Culmen`, and `Wing` we a
152152
Let's check the behavior of each covariable for each species with `pmb.plot_pdp()`, which shows the marginal effect a covariate has on the predicted variable, while we average over all the other covariates.
153153

154154
```{code-cell} ipython3
155-
pmb.plot_pdp(μ, X=x_0, Y=y_0, grid=(5, 3), figsize=(6, 9));
155+
pmb.plot_pdp(μ, X=x_0, Y=y_0, grid=(5, 3), figsize=(12, 7));
156156
```
157157

158158
The pdp plot, together with the Variable Importance plot, confirms that `Tail` is the covariable with the smaller effect over the predicted variable. In the Variable Importance plot `Tail` is the last covariable to be added and does not improve the result, in the pdp plot `Tail` has the flattest response. For the rest of the covariables in this plot, it's hard to see which of them have more effect over the predicted variable, because they have great variability, showed in the HDI wide, same as before later we are going to see a way to reduce this variability. Finally, some variability depends on the amount of data for each species, which we can see in the `counts` from one of the covariables using Pandas `.describe()` and grouping the data from "Species" with `.groupby("Species")`.
@@ -215,11 +215,14 @@ with pm.Model(coords=coords) as model_t:
215215
Now we are going to reproduce the same analyses as before.
216216

217217
```{code-cell} ipython3
218-
pmb.plot_variable_importance(idata_t, μ_t, x_0, method="VI", random_seed=RANDOM_SEED);
218+
vi_results = pmb.compute_variable_importance(
219+
idata_t, μ_t, x_0, method="VI", random_seed=RANDOM_SEED
220+
)
221+
pmb.plot_variable_importance(vi_results);
219222
```
220223

221224
```{code-cell} ipython3
222-
pmb.plot_pdp(μ_t, X=x_0, Y=y_0, grid=(5, 3), figsize=(6, 9));
225+
pmb.plot_pdp(μ_t, X=x_0, Y=y_0, grid=(5, 3), figsize=(12, 7));
223226
```
224227

225228
Comparing these two plots with the previous ones shows a marked reduction in the variance for each one. In the case of `pmb.plot_variable_importance()` there are smallers error bands with an R$^{2}$ value more close to 1. And for `pm.plot_pdp()` we can see thinner bands and a reduction in the limits on the y-axis, this is a representation of the reduction of the uncertainty due to adjusting the trees separately. Another benefit of this is that is more visible the behavior of each covariable for each one of the species.
@@ -254,7 +257,8 @@ all
254257
```
255258

256259
## Authors
257-
- Authored by [Pablo Garay](https://github.com/PabloGGaray) and [Osvaldo Martin](https://aloctavodia.github.io/) in May, 2024
260+
- Authored by [Pablo Garay](https://github.com/PabloGGaray) and [Osvaldo Martin](https://aloctavodia.github.io/) in May, 2024
261+
- Updated by Osvaldo Martin in Dec, 2024
258262

259263
+++
260264

examples/bart/bart_heteroscedasticity.ipynb

Lines changed: 52 additions & 65 deletions
Large diffs are not rendered by default.

examples/bart/bart_heteroscedasticity.myst.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ The fit looks good! In fact, we see that the mean and variance increase as a fun
147147
- Authored by [Juan Orduz](https://juanitorduz.github.io/) in Feb, 2023
148148
- Rerun by Osvaldo Martin in Mar, 2023
149149
- Rerun by Osvaldo Martin in Nov, 2023
150+
- Rerun by Osvaldo Martin in Dec, 2024
150151

151152
+++
152153

examples/bart/bart_introduction.ipynb

Lines changed: 198 additions & 252 deletions
Large diffs are not rendered by default.

examples/bart/bart_introduction.myst.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -199,14 +199,15 @@ Finally, like with other regression methods, we should be careful that the effec
199199

200200
### Variable importance
201201

202-
As we saw in the previous section a partial dependence plot can visualize and give us an idea of how much each covariable contributes to the predicted outcome. Moreover, PyMC-BART provides a novel method to assess the importance of each variable in the model. You can see an example in the following figure.
202+
As we saw in the previous section a partial dependence plot can visualize give us an idea of how much each covariable contributes to the predicted outcome. Moreover, PyMC-BART provides a novel method to assess the importance of each variable in the model. You can see an example in the following figure.
203203

204204
On the x-axis we have the number of covariables and on the y-axis R² (the the square of the Pearson correlation coefficient) between the predictions made for the full model (all variables included) and the restricted models, those with only a subset of the variables.
205205

206206
In this example, the most important variable is `hour`, then `temperature`, `humidity`, and finally `workingday`. Notice that the first value of R², is the value of a model that only includes the variable `hour`, the second R² is for a model with two variables, `hour` and `temperature`, and so on. Besides this ranking, we can see that even a model with a single component, `hour`, is very close to the full model. Even more, the model with two components `hour`, and `temperature` is on average indistinguishable from the full model. The error bars represent the 94 \% HDI from the posterior predictive distribution. This means that we should expect a model with only `hour` and `temperature` to have a similar predictice performance than a model with the four variables, `hour`, `temperature`, `humidity`, and `workingday`.
207207

208208
```{code-cell} ipython3
209-
pmb.plot_variable_importance(idata_bikes, μ, X);
209+
vi_results = pmb.compute_variable_importance(idata_bikes, μ, X)
210+
pmb.plot_variable_importance(vi_results);
210211
```
211212

212213
`plot_variable_importance` is fast because it makes two assumptions:
@@ -405,6 +406,7 @@ This plot helps us understand the reason behind the bad performance on the test
405406
* Juan Orduz added out-of-sample section in Jan, 2023
406407
* Updated by Osvaldo Martin in Mar, 2023
407408
* Updated by Osvaldo Martin in Nov, 2023
409+
* Updated by Osvaldo Martin in Dec, 2024
408410

409411
+++
410412

examples/bart/bart_quantile_regression.ipynb

Lines changed: 75 additions & 110 deletions
Large diffs are not rendered by default.

examples/bart/bart_quantile_regression.myst.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ We can see that when we use a Normal likelihood, and from that fit we compute th
149149
* Authored by Osvaldo Martin in Jan, 2023
150150
* Rerun by Osvaldo Martin in Mar, 2023
151151
* Rerun by Osvaldo Martin in Nov, 2023
152+
* Rerun by Osvaldo Martin in Dec, 2024
152153

153154
+++
154155

examples/gaussian_processes/GP-TProcess.ipynb

Lines changed: 304 additions & 92 deletions
Large diffs are not rendered by default.

examples/gaussian_processes/GP-TProcess.myst.md

Lines changed: 58 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -10,9 +10,16 @@ kernelspec:
1010
name: python3
1111
---
1212

13+
(GP-TProcess)=
1314
# Student-t Process
1415

15-
PyMC3 also includes T-process priors. They are a generalization of a Gaussian process prior to the multivariate Student's T distribution. The usage is identical to that of `gp.Latent`, except they require a degrees of freedom parameter when they are specified in the model. For more information, see chapter 9 of [Rasmussen+Williams](http://www.gaussianprocess.org/gpml/), and [Shah et al.](https://arxiv.org/abs/1402.4306).
16+
:::{post} August 2017
17+
:tags: t-process, gaussian process, bayesian non-parametrics
18+
:category: intermediate
19+
:author: Bill Engels
20+
:::
21+
22+
PyMC also includes T-process priors. They are a generalization of a Gaussian process prior to the multivariate Student's T distribution. The usage is identical to that of `gp.Latent`, except they require a degrees of freedom parameter when they are specified in the model. For more information, see chapter 9 of [Rasmussen+Williams](http://www.gaussianprocess.org/gpml/), and [Shah et al.](https://arxiv.org/abs/1402.4306).
1623

1724
Note that T processes aren't additive in the same way as GPs, so addition of `TP` objects are not supported.
1825

@@ -23,10 +30,13 @@ Note that T processes aren't additive in the same way as GPs, so addition of `TP
2330
The following code draws samples from a T process prior with 3 degrees of freedom and a Gaussian process, both with the same covariance matrix.
2431

2532
```{code-cell} ipython3
33+
import arviz as az
2634
import matplotlib.pyplot as plt
2735
import numpy as np
28-
import pymc3 as pm
29-
import theano.tensor as tt
36+
import pymc as pm
37+
import pytensor.tensor as pt
38+
39+
from pymc.gp.util import plot_gp_dist
3040
3141
%matplotlib inline
3242
```
@@ -39,33 +49,34 @@ n = 100 # The number of data points
3949
X = np.linspace(0, 10, n)[:, None] # The inputs to the GP, they must be arranged as a column vector
4050
4151
# Define the true covariance function and its parameters
42-
ℓ_true = 1.0
43-
η_true = 3.0
44-
cov_func = η_true**2 * pm.gp.cov.Matern52(1, ℓ_true)
52+
ell_true = 1.0
53+
eta_true = 3.0
54+
cov_func = eta_true**2 * pm.gp.cov.Matern52(1, ell_true)
4555
4656
# A mean function that is zero everywhere
4757
mean_func = pm.gp.mean.Zero()
4858
4959
# The latent function values are one sample from a multivariate normal
5060
# Note that we have to call `eval()` because PyMC3 built on top of Theano
51-
tp_samples = pm.MvStudentT.dist(mu=mean_func(X).eval(), cov=cov_func(X).eval(), nu=3).random(size=8)
61+
tp_samples = pm.draw(pm.MvStudentT.dist(mu=mean_func(X).eval(), scale=cov_func(X).eval(), nu=3), 8)
5262
5363
## Plot samples from TP prior
5464
fig = plt.figure(figsize=(12, 5))
55-
ax = fig.gca()
56-
ax.plot(X.flatten(), tp_samples.T, lw=3, alpha=0.6)
57-
ax.set_xlabel("X")
58-
ax.set_ylabel("y")
59-
ax.set_title("Samples from TP with DoF=3")
65+
ax0 = fig.gca()
66+
ax0.plot(X.flatten(), tp_samples.T, lw=3, alpha=0.6)
67+
ax0.set_xlabel("X")
68+
ax0.set_ylabel("y")
69+
ax0.set_title("Samples from TP with DoF=3")
6070
6171
62-
gp_samples = pm.MvNormal.dist(mu=mean_func(X).eval(), cov=cov_func(X).eval()).random(size=8)
72+
gp_samples = pm.draw(pm.MvNormal.dist(mu=mean_func(X).eval(), cov=cov_func(X).eval()), 8)
6373
fig = plt.figure(figsize=(12, 5))
64-
ax = fig.gca()
65-
ax.plot(X.flatten(), gp_samples.T, lw=3, alpha=0.6)
66-
ax.set_xlabel("X")
67-
ax.set_ylabel("y")
68-
ax.set_title("Samples from GP");
74+
ax1 = fig.gca()
75+
ax1.plot(X.flatten(), gp_samples.T, lw=3, alpha=0.6)
76+
ax1.set_xlabel("X")
77+
ax1.set_ylabel("y")
78+
ax1.set_ylim(ax0.get_ylim())
79+
ax1.set_title("Samples from GP");
6980
```
7081

7182
## Poisson data generated by a T process
@@ -79,16 +90,16 @@ n = 150 # The number of data points
7990
X = np.linspace(0, 10, n)[:, None] # The inputs to the GP, they must be arranged as a column vector
8091
8192
# Define the true covariance function and its parameters
82-
ℓ_true = 1.0
83-
η_true = 3.0
84-
cov_func = η_true**2 * pm.gp.cov.ExpQuad(1, ℓ_true)
93+
ell_true = 1.0
94+
eta_true = 3.0
95+
cov_func = eta_true**2 * pm.gp.cov.ExpQuad(1, ell_true)
8596
8697
# A mean function that is zero everywhere
8798
mean_func = pm.gp.mean.Zero()
8899
89100
# The latent function values are one sample from a multivariate normal
90101
# Note that we have to call `eval()` because PyMC3 built on top of Theano
91-
f_true = pm.MvStudentT.dist(mu=mean_func(X).eval(), cov=cov_func(X).eval(), nu=3).random(size=1)
102+
f_true = pm.draw(pm.MvStudentT.dist(mu=mean_func(X).eval(), scale=cov_func(X).eval(), nu=3), 1)
92103
y = np.random.poisson(f_true**2)
93104
94105
fig = plt.figure(figsize=(12, 5))
@@ -102,23 +113,22 @@ plt.legend();
102113

103114
```{code-cell} ipython3
104115
with pm.Model() as model:
105-
= pm.Gamma("", alpha=2, beta=2)
106-
η = pm.HalfCauchy("η", beta=3)
107-
cov = η**2 * pm.gp.cov.ExpQuad(1, )
116+
ell = pm.Gamma("ell", alpha=2, beta=2)
117+
eta = pm.HalfCauchy("eta", beta=3)
118+
cov = eta**2 * pm.gp.cov.ExpQuad(1, ell)
108119
109120
# informative prior on degrees of freedom < 5
110-
ν = pm.Gamma("ν", alpha=2, beta=1)
111-
tp = pm.gp.TP(cov_func=cov, nu=ν)
121+
nu = pm.Gamma("nu", alpha=2, beta=1)
122+
tp = pm.gp.TP(scale_func=cov, nu=nu)
112123
f = tp.prior("f", X=X)
113124
114-
# adding a small constant seems to help with numerical stability here
115-
y_ = pm.Poisson("y", mu=tt.square(f) + 1e-6, observed=y)
125+
pm.Poisson("y", mu=pt.square(f), observed=y)
116126
117-
tr = pm.sample(1000)
127+
tr = pm.sample(target_accept=0.9, nuts_sampler="nutpie", chains=2)
118128
```
119129

120130
```{code-cell} ipython3
121-
pm.traceplot(tr, var_names=["", "ν", "η"]);
131+
az.plot_trace(tr, var_names=["ell", "nu", "eta"]);
122132
```
123133

124134
```{code-cell} ipython3
@@ -131,24 +141,37 @@ with model:
131141
132142
# Sample from the GP conditional distribution
133143
with model:
134-
pred_samples = pm.sample_posterior_predictive(tr, vars=[f_pred], samples=1000)
144+
pm.sample_posterior_predictive(tr, var_names=["f_pred"], extend_inferencedata=True)
135145
```
136146

137147
```{code-cell} ipython3
138148
fig = plt.figure(figsize=(12, 5))
139149
ax = fig.gca()
140-
from pymc3.gp.util import plot_gp_dist
141150
142-
plot_gp_dist(ax, np.square(pred_samples["f_pred"]), X_new)
151+
f_pred_samples = np.square(
152+
az.extract(tr.posterior_predictive).astype(np.float32)["f_pred"].values
153+
).T
154+
plot_gp_dist(ax, f_pred_samples, X_new)
143155
plt.plot(X, np.square(f_true), "dodgerblue", lw=3, label="True f")
144156
plt.plot(X, y, "ok", ms=3, alpha=0.5, label="Observed data")
145157
plt.xlabel("X")
146158
plt.ylabel("True f(x)")
147-
plt.ylim([-2, 20])
148159
plt.title("Conditional distribution of f_*, given f")
149160
plt.legend();
150161
```
151162

163+
## Authors
164+
165+
* Authored by Bill Engels
166+
* Updated by Chris Fonnesbeck to use PyMC v5
167+
168+
+++
169+
170+
## References
171+
:::{bibliography}
172+
:filter: docname in docnames
173+
:::
174+
152175
```{code-cell} ipython3
153176
%load_ext watermark
154177
%watermark -n -u -v -iv -w

0 commit comments

Comments
 (0)