Skip to content

Commit bad197e

Browse files
committed
Address reviewer comments
1 parent 3897626 commit bad197e

File tree

2 files changed

+44
-18
lines changed

2 files changed

+44
-18
lines changed

examples/variational_inference/bayesian_neural_network_advi.ipynb

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
"cell_type": "markdown",
1313
"metadata": {},
1414
"source": [
15-
":::{post} Apr 25, 2022\n",
16-
":tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference\n",
15+
":::{post} May 30, 2022\n",
16+
":tags: neural networks, perceptron, variational inference, minibatch\n",
1717
":category: intermediate\n",
1818
":author: Thomas Wiecki, updated by Chris Fonnesbeck\n",
1919
":::"
@@ -28,7 +28,7 @@
2828
"**Probabilistic Programming**, **Deep Learning** and \"**Big Data**\" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.\n",
2929
"\n",
3030
"### Probabilistic Programming at scale\n",
31-
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan. \n",
31+
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan. \n",
3232
"\n",
3333
"Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).\n",
3434
"\n",
@@ -106,6 +106,7 @@
106106
"cell_type": "code",
107107
"execution_count": 3,
108108
"metadata": {
109+
"collapsed": true,
109110
"jupyter": {
110111
"outputs_hidden": true
111112
}
@@ -162,6 +163,7 @@
162163
"cell_type": "code",
163164
"execution_count": 5,
164165
"metadata": {
166+
"collapsed": true,
165167
"jupyter": {
166168
"outputs_hidden": true
167169
}
@@ -230,9 +232,9 @@
230232
"source": [
231233
"### Variational Inference: Scaling model complexity\n",
232234
"\n",
233-
"We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
235+
"We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
234236
"\n",
235-
"Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
237+
"Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
236238
]
237239
},
238240
{
@@ -351,13 +353,14 @@
351353
"cell_type": "markdown",
352354
"metadata": {},
353355
"source": [
354-
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
356+
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
355357
]
356358
},
357359
{
358360
"cell_type": "code",
359361
"execution_count": 9,
360362
"metadata": {
363+
"collapsed": true,
361364
"jupyter": {
362365
"outputs_hidden": true
363366
}
@@ -425,7 +428,7 @@
425428
"metadata": {},
426429
"outputs": [],
427430
"source": [
428-
"pred = ppc.posterior_predictive[\"out\"].squeeze().mean(axis=0) > 0.5"
431+
"pred = ppc.posterior_predictive[\"out\"].mean((\"chain\", \"draw\")) > 0.5"
429432
]
430433
},
431434
{
@@ -494,6 +497,7 @@
494497
"cell_type": "code",
495498
"execution_count": 13,
496499
"metadata": {
500+
"collapsed": true,
497501
"jupyter": {
498502
"outputs_hidden": true
499503
}
@@ -509,6 +513,7 @@
509513
"cell_type": "code",
510514
"execution_count": 14,
511515
"metadata": {
516+
"collapsed": true,
512517
"jupyter": {
513518
"outputs_hidden": true
514519
}
@@ -611,7 +616,7 @@
611616
"cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)\n",
612617
"fig, ax = plt.subplots(figsize=(16, 9))\n",
613618
"contour = ax.contourf(\n",
614-
" grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap\n",
619+
" grid[0], grid[1], y_pred.mean((\"chain\", \"draw\")).values.reshape(100, 100), cmap=cmap\n",
615620
")\n",
616621
"ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color=\"C0\")\n",
617622
"ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color=\"C1\")\n",
@@ -838,6 +843,11 @@
838843
"cell_type": "markdown",
839844
"metadata": {},
840845
"source": [
846+
"## Authors\n",
847+
"\n",
848+
"- This notebook was originally authored as a [blog post](https://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/) by Thomas Wiecki in 2016\n",
849+
"- Updated by Chris Fonnesbeck for PyMC v4 in 2022\n",
850+
"\n",
841851
"## Watermark"
842852
]
843853
},
@@ -876,6 +886,14 @@
876886
"%load_ext watermark\n",
877887
"%watermark -n -u -v -iv -w -p xarray"
878888
]
889+
},
890+
{
891+
"cell_type": "markdown",
892+
"metadata": {},
893+
"source": [
894+
":::{include} ../page_footer.md\n",
895+
":::"
896+
]
879897
}
880898
],
881899
"metadata": {
@@ -884,7 +902,7 @@
884902
"hash": "5429d053af7e221df99a6f00514f0d50433afea7fb367ba3ad570571d9163dca"
885903
},
886904
"kernelspec": {
887-
"display_name": "Python 3.9.10 ('pymc-dev-py39')",
905+
"display_name": "Python 3 (ipykernel)",
888906
"language": "python",
889907
"name": "python3"
890908
},

myst_nbs/variational_inference/bayesian_neural_network_advi.myst.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ jupytext:
66
format_version: 0.13
77
jupytext_version: 1.13.7
88
kernelspec:
9-
display_name: Python 3.9.10 ('pymc-dev-py39')
9+
display_name: Python 3 (ipykernel)
1010
language: python
1111
name: python3
1212
---
@@ -16,8 +16,8 @@ kernelspec:
1616

1717
+++
1818

19-
:::{post} Apr 25, 2022
20-
:tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference
19+
:::{post} May 30, 2022
20+
:tags: neural networks, perceptron, variational inference, minibatch
2121
:category: intermediate
2222
:author: Thomas Wiecki, updated by Chris Fonnesbeck
2323
:::
@@ -29,7 +29,7 @@ kernelspec:
2929
**Probabilistic Programming**, **Deep Learning** and "**Big Data**" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.
3030

3131
### Probabilistic Programming at scale
32-
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan.
32+
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan.
3333

3434
Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).
3535

@@ -171,9 +171,9 @@ That's not so bad. The `Normal` priors help regularize the weights. Usually we w
171171

172172
### Variational Inference: Scaling model complexity
173173

174-
We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
174+
We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
175175

176-
Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
176+
Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
177177

178178
```{code-cell} ipython3
179179
%%time
@@ -194,7 +194,7 @@ plt.xlabel("iteration");
194194
trace = approx.sample(draws=5000)
195195
```
196196

197-
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
197+
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
198198

199199
```{code-cell} ipython3
200200
---
@@ -210,7 +210,7 @@ with neural_network:
210210
We can average the predictions for each observation to estimate the underlying probability of class 1.
211211

212212
```{code-cell} ipython3
213-
pred = ppc.posterior_predictive["out"].squeeze().mean(axis=0) > 0.5
213+
pred = ppc.posterior_predictive["out"].mean(("chain", "draw")) > 0.5
214214
```
215215

216216
```{code-cell} ipython3
@@ -263,7 +263,7 @@ y_pred = ppc.posterior_predictive["out"]
263263
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)
264264
fig, ax = plt.subplots(figsize=(16, 9))
265265
contour = ax.contourf(
266-
grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap
266+
grid[0], grid[1], y_pred.mean(("chain", "draw")).values.reshape(100, 100), cmap=cmap
267267
)
268268
ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color="C0")
269269
ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color="C1")
@@ -337,9 +337,17 @@ You might argue that the above network isn't really deep, but note that we could
337337

338338
+++
339339

340+
## Authors
341+
342+
- This notebook was originally authored as a [blog post](https://twiecki.github.io/blog/2016/06/01/bayesian-deep-learning/) by Thomas Wiecki in 2016
343+
- Updated by Chris Fonnesbeck for PyMC v4 in 2022
344+
340345
## Watermark
341346

342347
```{code-cell} ipython3
343348
%load_ext watermark
344349
%watermark -n -u -v -iv -w -p xarray
345350
```
351+
352+
:::{include} ../page_footer.md
353+
:::

0 commit comments

Comments
 (0)