Skip to content

Commit 5a56149

Browse files
committed
Merge remote-tracking branch 'upstream/main'
2 parents dd4bf34 + 2861d6a commit 5a56149

23 files changed

+1252
-516
lines changed

examples/bart/bart_heteroscedasticity.ipynb

Lines changed: 130 additions & 28 deletions
Large diffs are not rendered by default.

examples/bart/bart_heteroscedasticity.myst.md

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
jupytext:
3+
formats: ipynb,md
34
text_representation:
45
extension: .md
56
format_name: myst
@@ -81,7 +82,7 @@ Next, we specify the model. Note that we just need one BART distribution which c
8182

8283
```{code-cell} ipython3
8384
with pm.Model() as model_marketing_full:
84-
w = pmb.BART("w", X=X, Y=np.log(Y), m=200, shape=(2, n_obs))
85+
w = pmb.BART("w", X=X, Y=np.log(Y), m=100, shape=(2, n_obs))
8586
y = pm.Gamma("y", mu=pm.math.exp(w[0]), sigma=pm.math.exp(w[1]), observed=Y)
8687
8788
pm.model_to_graphviz(model=model_marketing_full)
@@ -91,7 +92,7 @@ We now fit the model.
9192

9293
```{code-cell} ipython3
9394
with model_marketing_full:
94-
idata_marketing_full = pm.sample(random_seed=rng)
95+
idata_marketing_full = pm.sample(2000, random_seed=rng, compute_convergence_checks=False)
9596
posterior_predictive_marketing_full = pm.sample_posterior_predictive(
9697
trace=idata_marketing_full, random_seed=rng
9798
)
@@ -104,7 +105,7 @@ We can now visualize the posterior predictive distribution of the mean and the l
104105
```{code-cell} ipython3
105106
posterior_mean = idata_marketing_full.posterior["w"].mean(dim=("chain", "draw"))[0]
106107
107-
w_hdi = az.hdi(ary=idata_marketing_full, group="posterior", var_names=["w"])
108+
w_hdi = az.hdi(ary=idata_marketing_full, group="posterior", var_names=["w"], hdi_prob=0.5)
108109
109110
pps = az.extract(
110111
posterior_predictive_marketing_full, group="posterior_predictive", var_names=["y"]
@@ -116,14 +117,19 @@ idx = np.argsort(X[:, 0])
116117
117118
118119
fig, ax = plt.subplots()
119-
az.plot_hdi(x=X[:, 0], y=pps, ax=ax, fill_kwargs={"alpha": 0.3, "label": r"Likelihood $94\%$ HDI"})
120+
az.plot_hdi(
121+
x=X[:, 0],
122+
y=pps,
123+
ax=ax,
124+
hdi_prob=0.90,
125+
fill_kwargs={"alpha": 0.3, "label": r"Observations $90\%$ HDI"},
126+
)
120127
az.plot_hdi(
121128
x=X[:, 0],
122129
hdi_data=np.exp(w_hdi["w"].sel(w_dim_0=0)),
123130
ax=ax,
124-
fill_kwargs={"alpha": 0.6, "label": r"Mean $94\%$ HDI"},
131+
fill_kwargs={"alpha": 0.6, "label": r"Mean $50\%$ HDI"},
125132
)
126-
ax.plot(X[:, 0][idx], np.exp(posterior_mean[idx]), c="black", lw=3, label="Posterior Mean")
127133
ax.plot(df["youtube"], df["sales"], "o", c="C0", label="Raw Data")
128134
ax.legend(loc="upper left")
129135
ax.set(
@@ -138,8 +144,9 @@ The fit looks good! In fact, we see that the mean and variance increase as a fun
138144
+++
139145

140146
## Authors
141-
- Authored by [Juan Orduz](https://juanitorduz.github.io/) in February 2023
142-
- Rerun by Osvaldo Martin in March 2023
147+
- Authored by [Juan Orduz](https://juanitorduz.github.io/) in Feb, 2023
148+
- Rerun by Osvaldo Martin in Mar, 2023
149+
- Rerun by Osvaldo Martin in Nov, 2023
143150

144151
+++
145152

examples/bart/bart_introduction.ipynb

Lines changed: 381 additions & 129 deletions
Large diffs are not rendered by default.

examples/bart/bart_introduction.myst.md

Lines changed: 27 additions & 22 deletions
Large diffs are not rendered by default.

examples/bart/bart_quantile_regression.ipynb

Lines changed: 173 additions & 28 deletions
Large diffs are not rendered by default.

examples/bart/bart_quantile_regression.myst.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
---
22
jupytext:
3+
formats: ipynb,md
34
text_representation:
45
extension: .md
56
format_name: myst
@@ -146,7 +147,8 @@ We can see that when we use a Normal likelihood, and from that fit we compute th
146147

147148
## Authors
148149
* Authored by Osvaldo Martin in Jan, 2023
149-
* Rerun by Osvaldo Martin in March 2023
150+
* Rerun by Osvaldo Martin in Mar, 2023
151+
* Rerun by Osvaldo Martin in Nov, 2023
150152

151153
+++
152154

examples/gaussian_processes/GP-smoothing.ipynb

Lines changed: 316 additions & 188 deletions
Large diffs are not rendered by default.

examples/gaussian_processes/GP-smoothing.myst.md

Lines changed: 50 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -21,35 +21,45 @@ If we assume the functional dependency between $x$ and $y$ is **linear** then, b
2121
However, the **functional form** of $y=f(x)$ is **not always known in advance**, and it might be hard to choose which one to fit, given the data. For example, you wouldn't necessarily know which function to use, given the following observed data. Assume you haven't seen the formula that generated it:
2222

2323
```{code-cell} ipython3
24-
%pylab inline
25-
figsize(12, 6);
24+
import arviz as az
25+
import matplotlib.pyplot as plt
26+
import numpy as np
27+
import pymc as pm
28+
import scipy.stats as stats
29+
30+
from pytensor import shared
31+
32+
%config InlineBackend.figure_format = "retina"
2633
```
2734

2835
```{code-cell} ipython3
29-
import numpy as np
30-
import scipy.stats as stats
36+
RANDOM_SEED = 8927
37+
rng = np.random.default_rng(RANDOM_SEED)
38+
az.style.use("arviz-darkgrid")
39+
plt.rcParams["figure.figsize"] = (10, 4)
40+
```
3141

42+
```{code-cell} ipython3
3243
x = np.linspace(0, 50, 100)
33-
y = np.exp(1.0 + np.power(x, 0.5) - np.exp(x / 15.0)) + np.random.normal(scale=1.0, size=x.shape)
44+
y = np.exp(1.0 + np.power(x, 0.5) - np.exp(x / 15.0)) + rng.normal(scale=1.0, size=x.shape)
3445
35-
plot(x, y)
36-
xlabel("x")
37-
ylabel("y")
38-
title("Observed Data");
46+
fig, ax = plt.subplots()
47+
ax.plot(x, y)
48+
ax.set(title="Observed Data", xlabel="x", ylabel="y");
3949
```
4050

4151
### Let's try a linear regression first
4252

4353
As humans, we see that there is a non-linear dependency with some noise, and we would like to capture that dependency. If we perform a linear regression, we see that the "smoothed" data is less than satisfactory:
4454

4555
```{code-cell} ipython3
46-
plot(x, y)
47-
xlabel("x")
48-
ylabel("y")
56+
lin = stats.linregress(x, y)
4957
58+
fig, ax = plt.subplots()
59+
ax.plot(x, y)
5060
lin = stats.linregress(x, y)
51-
plot(x, lin.intercept + lin.slope * x)
52-
title("Linear Smoothing");
61+
ax.plot(x, lin.intercept + lin.slope * x)
62+
ax.set(title="Linear Smoothing", xlabel="x", ylabel="y");
5363
```
5464

5565
### Linear regression model recap
@@ -90,15 +100,9 @@ When we estimate the maximum likelihood values of the hidden process $z_i$ at ea
90100

91101
+++
92102

93-
### Let's describe the above GP-smoothing model in PyMC3
103+
### Let's describe the above GP-smoothing model in PyMC
94104

95-
```{code-cell} ipython3
96-
import pymc3 as pm
97-
98-
from pymc3.distributions.timeseries import GaussianRandomWalk
99-
from scipy import optimize
100-
from theano import shared
101-
```
105+
+++
102106

103107
Let's create a model with a shared parameter for specifying different levels of smoothing. We use very wide priors for the "mu" and "tau" parameters of the hidden Brownian motion, which you can adjust according to your application.
104108

@@ -110,7 +114,9 @@ with model:
110114
smoothing_param = shared(0.9)
111115
mu = pm.Normal("mu", sigma=LARGE_NUMBER)
112116
tau = pm.Exponential("tau", 1.0 / LARGE_NUMBER)
113-
z = GaussianRandomWalk("z", mu=mu, tau=tau / (1.0 - smoothing_param), shape=y.shape)
117+
z = pm.GaussianRandomWalk(
118+
"z", mu=mu, sigma=pm.math.sqrt((1.0 - smoothing_param) / tau), shape=y.shape
119+
)
114120
obs = pm.Normal("obs", mu=z, tau=tau / smoothing_param, observed=y)
115121
```
116122

@@ -134,9 +140,10 @@ Let's try to allocate 50% variance to the noise, and see if the result matches o
134140
smoothing = 0.5
135141
z_val = infer_z(smoothing)
136142
137-
plot(x, y)
138-
plot(x, z_val)
139-
title(f"Smoothing={smoothing}");
143+
fig, ax = plt.subplots()
144+
ax.plot(x, y)
145+
ax.plot(x, z_val)
146+
ax.set(title=f"Smoothing={smoothing}");
140147
```
141148

142149
It appears that the variance is split evenly between the noise and the hidden process, as expected.
@@ -147,17 +154,18 @@ Let's try gradually increasing the smoothness parameter to see if we can obtain
147154
smoothing = 0.9
148155
z_val = infer_z(smoothing)
149156
150-
plot(x, y)
151-
plot(x, z_val)
152-
title(f"Smoothing={smoothing}");
157+
fig, ax = plt.subplots()
158+
ax.plot(x, y)
159+
ax.plot(x, z_val)
160+
ax.set(title=f"Smoothing={smoothing}");
153161
```
154162

155163
### Smoothing "to the limits"
156164

157165
By increasing the smoothing parameter, we can gradually make the inferred values of the hidden Brownian motion approach the average value of the data. This is because as we increase the smoothing parameter, we allow less and less of the variance to be allocated to the Brownian motion, so eventually it approaches the process which almost doesn't change over the domain of $x$:
158166

159167
```{code-cell} ipython3
160-
fig, axes = subplots(2, 2)
168+
fig, axes = plt.subplots(nrows=2, ncols=2)
161169
162170
for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
163171
z_val = infer_z(smoothing)
@@ -167,9 +175,19 @@ for ax, smoothing in zip(axes.ravel(), [0.95, 0.99, 0.999, 0.9999]):
167175
ax.set_title(f"Smoothing={smoothing:05.4f}")
168176
```
169177

178+
## References
179+
180+
:::{bibliography}
181+
:filter: docname in docnames
182+
:::
183+
184+
+++
185+
186+
## Authors
187+
* Authored by [Andrey Kuzmenko](http://github.com/akuz)
188+
* Updated to v5 by [Juan Orduz](https://juanitorduz.github.io/) in Nov 2023 ([pymc-examples#603](https://github.com/pymc-devs/pymc-examples/pull/603))
189+
170190
```{code-cell} ipython3
171191
%load_ext watermark
172192
%watermark -n -u -v -iv -w
173193
```
174-
175-
This example originally contributed by: Andrey Kuzmenko, http://github.com/akuz

examples/case_studies/LKJ.ipynb renamed to examples/howto/LKJ.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@
161161
"\n",
162162
"The LKJ distribution provides a prior on the correlation matrix, $\\mathbf{C} = \\textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\\Sigma$. Since inverting $\\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\\Sigma$, $\\Sigma = \\mathbf{L} \\mathbf{L}^{\\top}$, where $\\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\\mathbf{x} - \\mu)^{\\top} \\Sigma^{-1} (\\mathbf{x} - \\mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.\n",
163163
"\n",
164-
"PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
164+
"PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\\mathbf{x}$. The LKJ distribution has the density $f(\\mathbf{C}\\ |\\ \\eta) \\propto |\\mathbf{C}|^{\\eta - 1}$, so $\\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\\eta \\to \\infty$.\n",
165165
"\n",
166166
"In this example, we model the standard deviations with $\\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\\mathbf{C} \\sim \\textrm{LKJ}(\\eta = 2)$."
167167
]
@@ -308,7 +308,7 @@
308308
"id": "QOCi1RKvr2Ph"
309309
},
310310
"source": [
311-
"We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:"
311+
"We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:"
312312
]
313313
},
314314
{

examples/case_studies/LKJ.myst.md renamed to examples/howto/LKJ.myst.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ $$f(\mathbf{x}\ |\ \mu, \Sigma^{-1}) = (2 \pi)^{-\frac{k}{2}} |\Sigma|^{-\frac{1
101101

102102
The LKJ distribution provides a prior on the correlation matrix, $\mathbf{C} = \textrm{Corr}(x_i, x_j)$, which, combined with priors on the standard deviations of each component, [induces](http://www3.stat.sinica.edu.tw/statistica/oldpdf/A10n416.pdf) a prior on the covariance matrix, $\Sigma$. Since inverting $\Sigma$ is numerically unstable and inefficient, it is computationally advantageous to use the [Cholesky decompositon](https://en.wikipedia.org/wiki/Cholesky_decomposition) of $\Sigma$, $\Sigma = \mathbf{L} \mathbf{L}^{\top}$, where $\mathbf{L}$ is a lower-triangular matrix. This decompositon allows computation of the term $(\mathbf{x} - \mu)^{\top} \Sigma^{-1} (\mathbf{x} - \mu)$ using back-substitution, which is more numerically stable and efficient than direct matrix inversion.
103103

104-
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the [LKJCholeskyCov](https://docs.pymc.io/en/latest/api/distributions/generated/pymc.LKJCholeskyCov.html) distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
104+
PyMC supports LKJ priors for the Cholesky decomposition of the covariance matrix via the {class}`pymc.LKJCholeskyCov` distribution. This distribution has parameters `n` and `sd_dist`, which are the dimension of the observations, $\mathbf{x}$, and the PyMC distribution of the component standard deviations, respectively. It also has a hyperparamter `eta`, which controls the amount of correlation between components of $\mathbf{x}$. The LKJ distribution has the density $f(\mathbf{C}\ |\ \eta) \propto |\mathbf{C}|^{\eta - 1}$, so $\eta = 1$ leads to a uniform distribution on correlation matrices, while the magnitude of correlations between components decreases as $\eta \to \infty$.
105105

106106
In this example, we model the standard deviations with $\textrm{Exponential}(1.0)$ priors, and the correlation matrix as $\mathbf{C} \sim \textrm{LKJ}(\eta = 2)$.
107107

@@ -175,7 +175,7 @@ with model:
175175

176176
+++ {"id": "QOCi1RKvr2Ph"}
177177

178-
We sample from this model using NUTS and give the trace to [ArviZ](https://arviz-devs.github.io/arviz/) for summarization:
178+
We sample from this model using NUTS and give the trace to {ref}`arviz` for summarization:
179179

180180
```{code-cell} ipython3
181181
---

0 commit comments

Comments
 (0)