Skip to content

Commit cb577c1

Browse files
committed
Address reviewer comments
1 parent b5ae725 commit cb577c1

File tree

2 files changed

+19
-18
lines changed

2 files changed

+19
-18
lines changed

examples/variational_inference/bayesian_neural_network_advi.ipynb

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@
1212
"cell_type": "markdown",
1313
"metadata": {},
1414
"source": [
15-
":::{post} Apr 25, 2022\n",
16-
":tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference\n",
15+
":::{post} May 30, 2022\n",
16+
":tags: neural networks, perceptron, variational inference, minibatch\n",
1717
":category: intermediate\n",
1818
":author: Thomas Wiecki, updated by Chris Fonnesbeck\n",
1919
":::"
@@ -28,7 +28,7 @@
2828
"**Probabilistic Programming**, **Deep Learning** and \"**Big Data**\" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.\n",
2929
"\n",
3030
"### Probabilistic Programming at scale\n",
31-
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan. \n",
31+
"**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan. \n",
3232
"\n",
3333
"Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).\n",
3434
"\n",
@@ -239,9 +239,9 @@
239239
"source": [
240240
"### Variational Inference: Scaling model complexity\n",
241241
"\n",
242-
"We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
242+
"We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.\n",
243243
"\n",
244-
"Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
244+
"Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior."
245245
]
246246
},
247247
{
@@ -360,13 +360,14 @@
360360
"cell_type": "markdown",
361361
"metadata": {},
362362
"source": [
363-
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
363+
"Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation)."
364364
]
365365
},
366366
{
367367
"cell_type": "code",
368368
"execution_count": 9,
369369
"metadata": {
370+
"collapsed": true,
370371
"jupyter": {
371372
"outputs_hidden": true
372373
}
@@ -434,7 +435,7 @@
434435
"metadata": {},
435436
"outputs": [],
436437
"source": [
437-
"pred = ppc.posterior_predictive[\"out\"].squeeze().mean(axis=0) > 0.5"
438+
"pred = ppc.posterior_predictive[\"out\"].mean((\"chain\", \"draw\")) > 0.5"
438439
]
439440
},
440441
{
@@ -623,7 +624,7 @@
623624
"cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)\n",
624625
"fig, ax = plt.subplots(figsize=(16, 9))\n",
625626
"contour = ax.contourf(\n",
626-
" grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap\n",
627+
" grid[0], grid[1], y_pred.mean((\"chain\", \"draw\")).values.reshape(100, 100), cmap=cmap\n",
627628
")\n",
628629
"ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color=\"C0\")\n",
629630
"ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color=\"C1\")\n",
@@ -908,7 +909,7 @@
908909
"hash": "5429d053af7e221df99a6f00514f0d50433afea7fb367ba3ad570571d9163dca"
909910
},
910911
"kernelspec": {
911-
"display_name": "Python 3.9.10 ('pymc-dev-py39')",
912+
"display_name": "Python 3 (ipykernel)",
912913
"language": "python",
913914
"name": "python3"
914915
},

examples/variational_inference/bayesian_neural_network_advi.myst.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: Python 3.9.10 ('pymc-dev-py39')
8+
display_name: Python 3 (ipykernel)
99
language: python
1010
name: python3
1111
---
@@ -15,8 +15,8 @@ kernelspec:
1515

1616
+++
1717

18-
:::{post} Apr 25, 2022
19-
:tags: pymc.ADVI, pymc.Bernoulli, pymc.Data, pymc.Minibatch, pymc.Model, pymc.Normal, variational inference
18+
:::{post} May 30, 2022
19+
:tags: neural networks, perceptron, variational inference, minibatch
2020
:category: intermediate
2121
:author: Thomas Wiecki, updated by Chris Fonnesbeck
2222
:::
@@ -28,7 +28,7 @@ kernelspec:
2828
**Probabilistic Programming**, **Deep Learning** and "**Big Data**" are among the biggest topics in machine learning. Inside of PP, a lot of innovation is focused on making things scale using **Variational Inference**. In this example, I will show how to use **Variational Inference** in PyMC to fit a simple Bayesian Neural Network. I will also discuss how bridging Probabilistic Programming and Deep Learning can open up very interesting avenues to explore in future research.
2929

3030
### Probabilistic Programming at scale
31-
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using [MCMC sampling algorithms](http://twiecki.github.io/blog/2015/11/10/mcmc-sampling/) we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in PyMC, NumPyro and Stan.
31+
**Probabilistic Programming** allows very flexible creation of custom probabilistic models and is mainly concerned with **inference** and learning from your data. The approach is inherently **Bayesian** so we can specify **priors** to inform and constrain our models and get uncertainty estimation in form of a **posterior** distribution. Using {ref}`MCMC sampling algorithms <multilevel_modeling>` we can draw samples from this posterior to very flexibly estimate these models. PyMC, [NumPyro](https://github.com/pyro-ppl/numpyro), and [Stan](http://mc-stan.org/) are the current state-of-the-art tools for consructing and estimating these models. One major drawback of sampling, however, is that it's often slow, especially for high-dimensional models and large datasets. That's why more recently, **variational inference** algorithms have been developed that are almost as flexible as MCMC but much faster. Instead of drawing samples from the posterior, these algorithms instead fit a distribution (*e.g.* normal) to the posterior turning a sampling problem into and optimization problem. Automatic Differentation Variational Inference {cite:p}`kucukelbir2015automatic` is implemented in several probabilistic programming packages including PyMC, NumPyro and Stan.
3232

3333
Unfortunately, when it comes to traditional ML problems like classification or (non-linear) regression, Probabilistic Programming often plays second fiddle (in terms of accuracy and scalability) to more algorithmic approaches like [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) (e.g. [random forests](https://en.wikipedia.org/wiki/Random_forest) or [gradient boosted regression trees](https://en.wikipedia.org/wiki/Boosting_(machine_learning)).
3434

@@ -177,9 +177,9 @@ That's not so bad. The `Normal` priors help regularize the weights. Usually we w
177177

178178
### Variational Inference: Scaling model complexity
179179

180-
We could now just run a MCMC sampler like {class}`~pymc.step_methods.hmc.nuts.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
180+
We could now just run a MCMC sampler like {class}`pymc.NUTS` which works pretty well in this case, but was already mentioned, this will become very slow as we scale our model up to deeper architectures with more layers.
181181

182-
Instead, we will use the {class}`~pymc.variational.inference.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
182+
Instead, we will use the {class}`pymc.ADVI` variational inference algorithm. This is much faster and will scale better. Note, that this is a mean-field approximation so we ignore correlations in the posterior.
183183

184184
```{code-cell} ipython3
185185
%%time
@@ -200,7 +200,7 @@ plt.xlabel("iteration");
200200
trace = approx.sample(draws=5000)
201201
```
202202

203-
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sampling.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
203+
Now that we trained our model, lets predict on the hold-out set using a posterior predictive check (PPC). We can use {func}`~pymc.sample_posterior_predictive` to generate new data (in this case class predictions) from the posterior (sampled from the variational estimation).
204204

205205
```{code-cell} ipython3
206206
---
@@ -216,7 +216,7 @@ with neural_network:
216216
We can average the predictions for each observation to estimate the underlying probability of class 1.
217217

218218
```{code-cell} ipython3
219-
pred = ppc.posterior_predictive["out"].squeeze().mean(axis=0) > 0.5
219+
pred = ppc.posterior_predictive["out"].mean(("chain", "draw")) > 0.5
220220
```
221221

222222
```{code-cell} ipython3
@@ -270,7 +270,7 @@ y_pred = ppc.posterior_predictive["out"]
270270
cmap = sns.diverging_palette(250, 12, s=85, l=25, as_cmap=True)
271271
fig, ax = plt.subplots(figsize=(16, 9))
272272
contour = ax.contourf(
273-
grid[0], grid[1], y_pred.squeeze().values.mean(axis=0).reshape(100, 100), cmap=cmap
273+
grid[0], grid[1], y_pred.mean(("chain", "draw")).values.reshape(100, 100), cmap=cmap
274274
)
275275
ax.scatter(X_test[pred == 0, 0], X_test[pred == 0, 1], color="C0")
276276
ax.scatter(X_test[pred == 1, 0], X_test[pred == 1, 1], color="C1")

0 commit comments

Comments
 (0)