Skip to content

Commit facf167

Browse files
committed
modify style
- Adopt QuantEcon naming format - Adopt one-line docstring format in accordance with PEP 8 - Change integral notation from $\int dx\, f(x)$ to $\int f(x)\, dx$ - Reorder the illustration of (`param`, `name_dist`) pairs
1 parent e8db23f commit facf167

File tree

1 file changed

+41
-55
lines changed

1 file changed

+41
-55
lines changed

lectures/bayes_nonconj.md

Lines changed: 41 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ jupytext:
44
extension: .md
55
format_name: myst
66
format_version: 0.13
7-
jupytext_version: 1.16.7
7+
jupytext_version: 1.17.2
88
kernelspec:
99
display_name: Python 3 (ipykernel)
1010
language: python
@@ -65,7 +65,7 @@ from numpyro.infer import Trace_ELBO as nTrace_ELBO
6565
from numpyro.optim import Adam as nAdam
6666
```
6767

68-
## Unleashing MCMC on a Binomial Likelihood
68+
## Unleashing MCMC on a binomial likelihood
6969

7070
This lecture begins with the binomial example in the {doc}`prob_meaning`.
7171

@@ -84,7 +84,7 @@ We use several alternative prior distributions.
8484

8585
We compare computed posteriors with ones associated with a conjugate prior as described in {doc}`prob_meaning`.
8686

87-
### Analytical Posterior
87+
### Analytical posterior
8888

8989
Assume that the random variable $X\sim Binom\left(n,\theta\right)$.
9090

@@ -131,9 +131,7 @@ The analytical posterior for a given conjugate beta prior is coded in the follow
131131

132132
```{code-cell} ipython3
133133
def simulate_draw(theta, n):
134-
"""
135-
Draws a Bernoulli sample of size n with probability P(Y=1) = theta
136-
"""
134+
"""Draws a Bernoulli sample of size n with probability P(Y=1) = theta"""
137135
rand_draw = np.random.rand(n)
138136
draw = (rand_draw < theta).astype(int)
139137
return draw
@@ -161,7 +159,7 @@ def analytical_beta_posterior(data, alpha0, beta0):
161159
return st.beta(alpha0 + up_num, beta0 + down_num)
162160
```
163161

164-
### Two Ways to Approximate Posteriors
162+
### Two ways to approximate posteriors
165163

166164
Suppose that we don't have a conjugate prior.
167165

@@ -193,7 +191,7 @@ a Kullback-Leibler (KL) divergence between true posterior and the putative poste
193191

194192
- minimizing the KL divergence is equivalent to maximizing a criterion called the **Evidence Lower Bound** (ELBO), as we shall verify soon.
195193

196-
## Prior Distributions
194+
## Prior distributions
197195

198196
In order to be able to apply MCMC sampling or VI, `numpyro` requires that a prior distribution satisfy special properties:
199197

@@ -230,29 +228,25 @@ def TruncatedLogNormal_trans(loc, scale):
230228
"""
231229
base_dist = ndist.TruncatedNormal(
232230
low=jnp.log(0), high=jnp.log(1), loc=loc, scale=scale
233-
)
231+
) #TODO:is it fine to use log(0)?
234232
return ndist.TransformedDistribution(base_dist, ndist.transforms.ExpTransform())
235233
236234
237235
def ShiftedVonMises(kappa):
238-
"""
239-
Obtains the shifted von Mises distribution using AffineTransform
240-
"""
236+
"""Obtains the shifted von Mises distribution using AffineTransform"""
241237
base_dist = ndist.VonMises(0, kappa)
242238
return ndist.TransformedDistribution(
243239
base_dist, ndist.transforms.AffineTransform(loc=0.5, scale=1 / (2 * jnp.pi))
244240
)
245241
246242
247243
def TruncatedLaplace(loc, scale):
248-
"""
249-
Obtains the truncated Laplace distribution on [0,1]
250-
"""
244+
"""Obtains the truncated Laplace distribution on [0,1]"""
251245
base_dist = ndist.Laplace(loc, scale)
252246
return ndist.TruncatedDistribution(base_dist, low=0.0, high=1.0)
253247
```
254248

255-
### Variational Inference
249+
### Variational inference
256250

257251
Instead of directly sampling from the posterior, the **variational inference** method approximates an unknown posterior distribution with a family of tractable distributions/densities.
258252

@@ -275,7 +269,7 @@ $$
275269
where
276270

277271
$$
278-
p\left(Y\right)=\int d\theta p\left(Y\mid\theta\right)p\left(Y\right).
272+
p\left(Y\right)=\int p\left(Y\mid\theta\right)p\left(Y\right) d\theta.
279273
$$ (eq:intchallenge)
280274
281275
The integral on the right side of {eq}`eq:intchallenge` is typically difficult to compute.
@@ -298,19 +292,19 @@ Note that
298292
299293
$$
300294
\begin{aligned}D_{KL}(q(\theta;\phi)\;\|\;p(\theta\mid Y)) & =-\int d\theta q(\theta;\phi)\log\frac{P(\theta\mid Y)}{q(\theta;\phi)}\\
301-
& =-\int d\theta q(\theta)\log\frac{\frac{p(\theta,Y)}{p(Y)}}{q(\theta)}\\
302-
& =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{p(\theta)q(Y)}\\
303-
& =-\int d\theta q(\theta)\left[\log\frac{p(\theta,Y)}{q(\theta)}-\log p(Y)\right]\\
304-
& =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\int d\theta q(\theta)\log p(Y)\\
305-
& =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\log p(Y)\\
306-
\log p(Y)&=D_{KL}(q(\theta;\phi)\;\|\;p(\theta\mid Y))+\int d\theta q_{\phi}(\theta)\log\frac{p(\theta,Y)}{q_{\phi}(\theta)}
295+
& =-\int q(\theta)\log\frac{\frac{p(\theta,Y)}{p(Y)}}{q(\theta)} d\theta\\
296+
& =-\int q(\theta)\log\frac{p(\theta,Y)}{p(\theta)q(Y)} d\theta\\
297+
& =-\int q(\theta)\left[\log\frac{p(\theta,Y)}{q(\theta)}-\log p(Y)\right] d\theta\\
298+
& =-\int q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\int d\theta q(\theta)\log p(Y) d\theta\\
299+
& =-\int q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\log p(Y) d\theta\\
300+
\log p(Y)&=D_{KL}(q(\theta;\phi)\;\|\;p(\theta\mid Y))+\int q_{\phi}(\theta)\log\frac{p(\theta,Y)}{q_{\phi}(\theta)} d\theta
307301
\end{aligned}
308302
$$
309303
310304
For observed data $Y$, $p(\theta,Y)$ is a constant, so minimizing KL divergence is equivalent to maximizing
311305
312306
$$
313-
ELBO\equiv\int d\theta q_{\phi}(\theta)\log\frac{p(\theta,Y)}{q_{\phi}(\theta)}=\mathbb{E}_{q_{\phi}(\theta)}\left[\log p(\theta,Y)-\log q_{\phi}(\theta)\right]
307+
ELBO\equiv\int q_{\phi}(\theta)\log\frac{p(\theta,Y)}{q_{\phi}(\theta)} d\theta=\mathbb{E}_{q_{\phi}(\theta)}\left[\log p(\theta,Y)-\log q_{\phi}(\theta)\right]
314308
$$ (eq:ELBO)
315309
316310
Formula {eq}`eq:ELBO` is called the evidence lower bound (ELBO).
@@ -338,16 +332,16 @@ We have constructed a Python class `BayesianInference` that requires the followi
338332
- `name_dist`: a string that specifies distribution names
339333
340334
The (`param`, `name_dist`) pair includes:
341-
- ('beta', alpha, beta)
335+
- (alpha, beta, 'beta')
342336
343-
- ('uniform', lower_bound, upper_bound)
337+
- (lower_bound, upper_bound, 'uniform')
344338
345-
- ('lognormal', loc, scale)
339+
- (loc, scale, 'lognormal')
346340
- Note: This is the truncated log normal.
347341
348-
- ('vonMises', kappa), where kappa denotes concentration parameter, and center location is set to $0.5$. Using `numpyro`, this is the **shifted** distribution.
342+
- (kappa, 'vonMises'), where kappa denotes concentration parameter, and center location is set to $0.5$. Using `numpyro`, this is the **shifted** distribution.
349343
350-
- ('laplace', loc, scale)
344+
- (loc, scale, 'laplace')
351345
- Note: This is the truncated Laplace
352346
353347
The class `BayesianInference` has several key methods :
@@ -384,9 +378,7 @@ class BayesianInference:
384378
self.rng_key = jax_random.PRNGKey(0)
385379
386380
def sample_prior(self):
387-
"""
388-
Define the prior distribution to sample from in numpyro models.
389-
"""
381+
"""Define the prior distribution to sample from in numpyro models."""
390382
if self.name_dist == "beta":
391383
# unpack parameters
392384
alpha0, beta0 = self.param
@@ -493,9 +485,7 @@ class BayesianInference:
493485
numpyro.sample("theta", ndist.TruncatedNormal(loc, scale, low=0.0, high=1.0))
494486
495487
def SVI_init(self, guide_dist, lr=0.0005):
496-
"""
497-
Initiate SVI training mode with Adam optimizer
498-
"""
488+
"""Initiate SVI training mode with Adam optimizer"""
499489
adam_params = {"lr": lr}
500490
501491
if guide_dist == "beta":
@@ -533,7 +523,7 @@ class BayesianInference:
533523
return params, losses
534524
```
535525
536-
## Alternative Prior Distributions
526+
## Alternative prior distributions
537527
538528
Let's see how well our sampling algorithm does in approximating
539529
@@ -574,7 +564,7 @@ exampleLP.show_prior(size=100000, bins=40)
574564
575565
Having assured ourselves that our sampler seems to do a good job, let's put it to work in using MCMC to compute posterior probabilities.
576566
577-
## Posteriors Via MCMC and VI
567+
## Posteriors via MCMC and VI
578568
579569
We construct a class `BayesianInferencePlot` to implement MCMC or VI algorithms and plot multiple posteriors for different updating data sizes and different possible priors.
580570
@@ -604,9 +594,7 @@ class BayesianInferencePlot:
604594
"""
605595
606596
def __init__(self, theta, N_list, BayesianInferenceClass, binwidth=0.02):
607-
"""
608-
Enter Parameters for data generation and plotting
609-
"""
597+
"""Enter Parameters for data generation and plotting"""
610598
self.theta = theta
611599
self.N_list = N_list
612600
self.BayesianInferenceClass = BayesianInferenceClass
@@ -634,7 +622,7 @@ class BayesianInferencePlot:
634622
linewidth=self.linewidth,
635623
alpha=0.1,
636624
ax=ax,
637-
label="Prior Distribution",
625+
label="Prior distribution",
638626
)
639627
640628
# plot posteriors
@@ -653,7 +641,7 @@ class BayesianInferencePlot:
653641
label=f"Posterior with $n={n}$",
654642
)
655643
ax.legend(loc="upper left")
656-
ax.set_title("MCMC Sampling density of Posterior Distributions", fontsize=15)
644+
ax.set_title("MCMC sampling density of posterior distributions", fontsize=15)
657645
plt.xlim(0, 1)
658646
plt.show()
659647
@@ -667,7 +655,6 @@ class BayesianInferencePlot:
667655
y = st.beta.pdf(xaxis, a=params["alpha_q"], b=params["beta_q"])
668656
669657
elif guide_dist == "normal":
670-
671658
# rescale upper/lower bound. See Scipy's truncnorm doc
672659
lower, upper = (0, 1)
673660
loc, scale = params["loc"], params["scale"]
@@ -692,7 +679,7 @@ class BayesianInferencePlot:
692679
linewidth=self.linewidth,
693680
alpha=0.1,
694681
ax=ax,
695-
label="Prior Distribution",
682+
label="Prior distribution",
696683
)
697684
698685
# plot posteriors
@@ -710,7 +697,7 @@ class BayesianInferencePlot:
710697
)
711698
ax.legend(loc="upper left")
712699
ax.set_title(
713-
f"SVI density of Posterior Distributions with {guide_dist} guide",
700+
f"SVI density of posterior distributions with {guide_dist} guide",
714701
fontsize=15,
715702
)
716703
plt.xlim(0, 1)
@@ -732,7 +719,7 @@ SVI_num_steps = 5000
732719
true_theta = 0.8
733720
```
734721
735-
### Beta Prior and Posteriors:
722+
### Beta prior and posteriors:
736723
737724
Let's compare outcomes when we use a Beta prior.
738725
@@ -745,19 +732,18 @@ For the same Beta prior, we shall
745732
Let's start with the analytical method that we described in this {doc}`prob_meaning`
746733
747734
```{code-cell} ipython3
748-
# First examine Beta prior
735+
# first examine Beta prior
749736
BETA = BayesianInference(param=(5, 5), name_dist="beta")
750737
751738
BETA_plot = BayesianInferencePlot(true_theta, num_list, BETA)
752739
753-
754740
# plot analytical Beta prior and posteriors
755741
xaxis = np.linspace(0, 1, 1000)
756742
y_prior = st.beta.pdf(xaxis, 5, 5)
757743
758744
fig, ax = plt.subplots(figsize=(10, 6))
759745
# plot analytical beta prior
760-
ax.plot(xaxis, y_prior, label="Analytical Beta Prior", color="#4C4E52")
746+
ax.plot(xaxis, y_prior, label="Analytical Beta prior", color="#4C4E52")
761747
762748
data, colorlist, N_list = BETA_plot.data, BETA_plot.colorlist, BETA_plot.N_list
763749
@@ -769,10 +755,10 @@ for id, n in enumerate(N_list):
769755
xaxis,
770756
y_posterior,
771757
color=colorlist[id - 1],
772-
label=f"Analytical Beta Posterior with $n={n}$",
758+
label=f"Analytical Beta posterior with $n={n}$",
773759
)
774760
ax.legend(loc="upper left")
775-
ax.set_title("Analytical Beta Prior and Posterior", fontsize=15)
761+
ax.set_title("Analytical Beta prior and posterior", fontsize=15)
776762
plt.xlim(0, 1)
777763
plt.show()
778764
```
@@ -809,7 +795,7 @@ BayesianInferencePlot(true_theta, num_list, BETA).SVI_plot(
809795
)
810796
```
811797
812-
## Non-conjugate Prior Distributions
798+
## Non-conjugate prior distributions
813799
814800
Having assured ourselves that our MCMC and VI methods can work well when we have a conjugate prior and so can also compute analytically, we
815801
next proceed to situations in which our prior is not a beta distribution, so we don't have a conjugate prior.
@@ -903,7 +889,7 @@ To get more accuracy we will now increase the number of steps for Variational In
903889
SVI_num_steps = 50000
904890
```
905891
906-
#### VI with a Truncated Normal Guide
892+
#### VI with a truncated Normal guide
907893
908894
```{code-cell} ipython3
909895
# Uniform
@@ -938,7 +924,7 @@ BayesianInferencePlot(true_theta, num_list, example_CLASS).SVI_plot(
938924
)
939925
```
940926
941-
#### Variational Inference with a Beta Guide Distribution
927+
#### Variational inference with a Beta guide distribution
942928
943929
```{code-cell} ipython3
944930
# Uniform
@@ -952,7 +938,7 @@ BayesianInferencePlot(true_theta, num_list, example_CLASS).SVI_plot(
952938
```
953939
954940
```{code-cell} ipython3
955-
# Log Normal
941+
# log Normal
956942
example_CLASS = LOGNORMAL
957943
print(
958944
f"=======INFO=======\nParameters: {example_CLASS.param}\nPrior Dist: {example_CLASS.name_dist}"

0 commit comments

Comments
 (0)