@@ -4,7 +4,7 @@ jupytext:
44 extension : .md
55 format_name : myst
66 format_version : 0.13
7- jupytext_version : 1.16.7
7+ jupytext_version : 1.17.2
88kernelspec :
99 display_name : Python 3 (ipykernel)
1010 language : python
@@ -65,7 +65,7 @@ from numpyro.infer import Trace_ELBO as nTrace_ELBO
6565from numpyro.optim import Adam as nAdam
6666```
6767
68- ## Unleashing MCMC on a Binomial Likelihood
68+ ## Unleashing MCMC on a binomial likelihood
6969
7070This lecture begins with the binomial example in the {doc}` prob_meaning ` .
7171
@@ -84,7 +84,7 @@ We use several alternative prior distributions.
8484
8585We compare computed posteriors with ones associated with a conjugate prior as described in {doc}` prob_meaning ` .
8686
87- ### Analytical Posterior
87+ ### Analytical posterior
8888
8989Assume that the random variable $X\sim Binom\left(n,\theta\right)$.
9090
@@ -131,9 +131,7 @@ The analytical posterior for a given conjugate beta prior is coded in the follow
131131
132132``` {code-cell} ipython3
133133def simulate_draw(theta, n):
134- """
135- Draws a Bernoulli sample of size n with probability P(Y=1) = theta
136- """
134+ """Draws a Bernoulli sample of size n with probability P(Y=1) = theta"""
137135 rand_draw = np.random.rand(n)
138136 draw = (rand_draw < theta).astype(int)
139137 return draw
@@ -161,7 +159,7 @@ def analytical_beta_posterior(data, alpha0, beta0):
161159 return st.beta(alpha0 + up_num, beta0 + down_num)
162160```
163161
164- ### Two Ways to Approximate Posteriors
162+ ### Two ways to approximate posteriors
165163
166164Suppose that we don't have a conjugate prior.
167165
@@ -193,7 +191,7 @@ a Kullback-Leibler (KL) divergence between true posterior and the putative poste
193191
194192- minimizing the KL divergence is equivalent to maximizing a criterion called the ** Evidence Lower Bound** (ELBO), as we shall verify soon.
195193
196- ## Prior Distributions
194+ ## Prior distributions
197195
198196In order to be able to apply MCMC sampling or VI, ` numpyro ` requires that a prior distribution satisfy special properties:
199197
@@ -230,29 +228,25 @@ def TruncatedLogNormal_trans(loc, scale):
230228 """
231229 base_dist = ndist.TruncatedNormal(
232230 low=jnp.log(0), high=jnp.log(1), loc=loc, scale=scale
233- )
231+ ) #TODO:is it fine to use log(0)?
234232 return ndist.TransformedDistribution(base_dist, ndist.transforms.ExpTransform())
235233
236234
237235def ShiftedVonMises(kappa):
238- """
239- Obtains the shifted von Mises distribution using AffineTransform
240- """
236+ """Obtains the shifted von Mises distribution using AffineTransform"""
241237 base_dist = ndist.VonMises(0, kappa)
242238 return ndist.TransformedDistribution(
243239 base_dist, ndist.transforms.AffineTransform(loc=0.5, scale=1 / (2 * jnp.pi))
244240 )
245241
246242
247243def TruncatedLaplace(loc, scale):
248- """
249- Obtains the truncated Laplace distribution on [0,1]
250- """
244+ """Obtains the truncated Laplace distribution on [0,1]"""
251245 base_dist = ndist.Laplace(loc, scale)
252246 return ndist.TruncatedDistribution(base_dist, low=0.0, high=1.0)
253247```
254248
255- ### Variational Inference
249+ ### Variational inference
256250
257251Instead of directly sampling from the posterior, the ** variational inference** method approximates an unknown posterior distribution with a family of tractable distributions/densities.
258252
275269where
276270
277271$$
278- p\left(Y\right)=\int d\theta p\left(Y\mid\theta\right)p\left(Y\right).
272+ p\left(Y\right)=\int p\left(Y\mid\theta\right)p\left(Y\right) d\theta .
279273$$ (eq:intchallenge)
280274
281275The integral on the right side of {eq}`eq:intchallenge` is typically difficult to compute.
@@ -298,19 +292,19 @@ Note that
298292
299293$$
300294\begin{aligned}D_ {KL}(q(\theta;\phi)\;\|\; p(\theta\mid Y)) & =-\int d\theta q(\theta;\phi)\log\frac{P(\theta\mid Y)}{q(\theta;\phi)}\\
301- & =-\int d\theta q(\theta)\log\frac{\frac{p(\theta,Y)}{p(Y)}}{q(\theta)}\\
302- & =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{p(\theta)q(Y)}\\
303- & =-\int d\theta q(\theta)\left[ \log\frac{p(\theta,Y)}{q(\theta)}-\log p(Y)\right] \\
304- & =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\int d\theta q(\theta)\log p(Y)\\
305- & =-\int d\theta q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\log p(Y)\\
306- \log p(Y)&=D_ {KL}(q(\theta;\phi)\;\|\; p(\theta\mid Y))+\int d\theta q_ {\phi}(\theta)\log\frac{p(\theta,Y)}{q_ {\phi}(\theta)}
295+ & =-\int q(\theta)\log\frac{\frac{p(\theta,Y)}{p(Y)}}{q(\theta)} d\theta \\
296+ & =-\int q(\theta)\log\frac{p(\theta,Y)}{p(\theta)q(Y)} d\theta \\
297+ & =-\int q(\theta)\left[ \log\frac{p(\theta,Y)}{q(\theta)}-\log p(Y)\right] d\theta \\
298+ & =-\int q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\int d\theta q(\theta)\log p(Y) d\theta \\
299+ & =-\int q(\theta)\log\frac{p(\theta,Y)}{q(\theta)}+\log p(Y) d\theta \\
300+ \log p(Y)&=D_ {KL}(q(\theta;\phi)\;\|\; p(\theta\mid Y))+\int q_ {\phi}(\theta)\log\frac{p(\theta,Y)}{q_ {\phi}(\theta)} d\theta
307301\end{aligned}
308302$$
309303
310304For observed data $Y$, $p(\theta,Y)$ is a constant, so minimizing KL divergence is equivalent to maximizing
311305
312306$$
313- ELBO\equiv\int d\theta q_ {\phi}(\theta)\log\frac{p(\theta,Y)}{q_ {\phi}(\theta)}=\mathbb{E}_ {q_ {\phi}(\theta)}\left[ \log p(\theta,Y)-\log q_ {\phi}(\theta)\right]
307+ ELBO\equiv\int q_ {\phi}(\theta)\log\frac{p(\theta,Y)}{q_ {\phi}(\theta)} d\theta =\mathbb{E}_ {q_ {\phi}(\theta)}\left[ \log p(\theta,Y)-\log q_ {\phi}(\theta)\right]
314308$$ (eq:ELBO)
315309
316310Formula {eq}`eq:ELBO` is called the evidence lower bound (ELBO).
@@ -338,16 +332,16 @@ We have constructed a Python class `BayesianInference` that requires the followi
338332- `name_dist`: a string that specifies distribution names
339333
340334The (`param`, `name_dist`) pair includes:
341- - ('beta', alpha, beta)
335+ - (alpha, beta, ' beta' )
342336
343- - ('uniform', lower_bound, upper_bound)
337+ - (lower_bound, upper_bound, 'uniform' )
344338
345- - ('lognormal', loc, scale)
339+ - (loc, scale, 'lognormal' )
346340 - Note: This is the truncated log normal.
347341
348- - ('vonMises', kappa ), where kappa denotes concentration parameter, and center location is set to $0.5$. Using `numpyro`, this is the **shifted** distribution.
342+ - (kappa, 'vonMises'), where kappa denotes concentration parameter, and center location is set to $0.5$. Using `numpyro`, this is the **shifted** distribution.
349343
350- - ('laplace', loc, scale)
344+ - (loc, scale, 'laplace' )
351345 - Note: This is the truncated Laplace
352346
353347The class `BayesianInference` has several key methods :
@@ -384,9 +378,7 @@ class BayesianInference:
384378 self.rng_key = jax_random.PRNGKey(0)
385379
386380 def sample_prior(self):
387- """
388- Define the prior distribution to sample from in numpyro models.
389- """
381+ """Define the prior distribution to sample from in numpyro models."""
390382 if self.name_dist == "beta":
391383 # unpack parameters
392384 alpha0, beta0 = self.param
@@ -493,9 +485,7 @@ class BayesianInference:
493485 numpyro.sample("theta", ndist.TruncatedNormal(loc, scale, low=0.0, high=1.0))
494486
495487 def SVI_init(self, guide_dist, lr=0.0005):
496- """
497- Initiate SVI training mode with Adam optimizer
498- """
488+ """Initiate SVI training mode with Adam optimizer"""
499489 adam_params = {"lr": lr}
500490
501491 if guide_dist == "beta":
@@ -533,7 +523,7 @@ class BayesianInference:
533523 return params, losses
534524```
535525
536- ## Alternative Prior Distributions
526+ ## Alternative prior distributions
537527
538528Let's see how well our sampling algorithm does in approximating
539529
@@ -574,7 +564,7 @@ exampleLP.show_prior(size=100000, bins=40)
574564
575565Having assured ourselves that our sampler seems to do a good job, let's put it to work in using MCMC to compute posterior probabilities.
576566
577- ## Posteriors Via MCMC and VI
567+ ## Posteriors via MCMC and VI
578568
579569We construct a class `BayesianInferencePlot` to implement MCMC or VI algorithms and plot multiple posteriors for different updating data sizes and different possible priors.
580570
@@ -604,9 +594,7 @@ class BayesianInferencePlot:
604594 """
605595
606596 def __init__(self, theta, N_list, BayesianInferenceClass, binwidth=0.02):
607- """
608- Enter Parameters for data generation and plotting
609- """
597+ """Enter Parameters for data generation and plotting"""
610598 self.theta = theta
611599 self.N_list = N_list
612600 self.BayesianInferenceClass = BayesianInferenceClass
@@ -634,7 +622,7 @@ class BayesianInferencePlot:
634622 linewidth=self.linewidth,
635623 alpha=0.1,
636624 ax=ax,
637- label="Prior Distribution ",
625+ label="Prior distribution ",
638626 )
639627
640628 # plot posteriors
@@ -653,7 +641,7 @@ class BayesianInferencePlot:
653641 label=f"Posterior with $n={n}$",
654642 )
655643 ax.legend(loc="upper left")
656- ax.set_title("MCMC Sampling density of Posterior Distributions ", fontsize=15)
644+ ax.set_title("MCMC sampling density of posterior distributions ", fontsize=15)
657645 plt.xlim(0, 1)
658646 plt.show()
659647
@@ -667,7 +655,6 @@ class BayesianInferencePlot:
667655 y = st.beta.pdf(xaxis, a=params["alpha_q"], b=params["beta_q"])
668656
669657 elif guide_dist == "normal":
670-
671658 # rescale upper/lower bound. See Scipy's truncnorm doc
672659 lower, upper = (0, 1)
673660 loc, scale = params["loc"], params["scale"]
@@ -692,7 +679,7 @@ class BayesianInferencePlot:
692679 linewidth=self.linewidth,
693680 alpha=0.1,
694681 ax=ax,
695- label="Prior Distribution ",
682+ label="Prior distribution ",
696683 )
697684
698685 # plot posteriors
@@ -710,7 +697,7 @@ class BayesianInferencePlot:
710697 )
711698 ax.legend(loc="upper left")
712699 ax.set_title(
713- f"SVI density of Posterior Distributions with {guide_dist} guide",
700+ f"SVI density of posterior distributions with {guide_dist} guide",
714701 fontsize=15,
715702 )
716703 plt.xlim(0, 1)
@@ -732,7 +719,7 @@ SVI_num_steps = 5000
732719true_theta = 0.8
733720```
734721
735- ### Beta Prior and Posteriors :
722+ ### Beta prior and posteriors :
736723
737724Let's compare outcomes when we use a Beta prior.
738725
@@ -745,19 +732,18 @@ For the same Beta prior, we shall
745732Let's start with the analytical method that we described in this {doc}`prob_meaning`
746733
747734```{code-cell} ipython3
748- # First examine Beta prior
735+ # first examine Beta prior
749736BETA = BayesianInference(param=(5, 5), name_dist="beta")
750737
751738BETA_plot = BayesianInferencePlot(true_theta, num_list, BETA)
752739
753-
754740# plot analytical Beta prior and posteriors
755741xaxis = np.linspace(0, 1, 1000)
756742y_prior = st.beta.pdf(xaxis, 5, 5)
757743
758744fig, ax = plt.subplots(figsize=(10, 6))
759745# plot analytical beta prior
760- ax.plot(xaxis, y_prior, label="Analytical Beta Prior ", color="#4C4E52")
746+ ax.plot(xaxis, y_prior, label="Analytical Beta prior ", color="#4C4E52")
761747
762748data, colorlist, N_list = BETA_plot.data, BETA_plot.colorlist, BETA_plot.N_list
763749
@@ -769,10 +755,10 @@ for id, n in enumerate(N_list):
769755 xaxis,
770756 y_posterior,
771757 color=colorlist[id - 1],
772- label=f"Analytical Beta Posterior with $n={n}$",
758+ label=f"Analytical Beta posterior with $n={n}$",
773759 )
774760ax.legend(loc="upper left")
775- ax.set_title("Analytical Beta Prior and Posterior ", fontsize=15)
761+ ax.set_title("Analytical Beta prior and posterior ", fontsize=15)
776762plt.xlim(0, 1)
777763plt.show()
778764```
@@ -809,7 +795,7 @@ BayesianInferencePlot(true_theta, num_list, BETA).SVI_plot(
809795)
810796```
811797
812- ## Non-conjugate Prior Distributions
798+ ## Non-conjugate prior distributions
813799
814800Having assured ourselves that our MCMC and VI methods can work well when we have a conjugate prior and so can also compute analytically, we
815801next proceed to situations in which our prior is not a beta distribution, so we don't have a conjugate prior.
@@ -903,7 +889,7 @@ To get more accuracy we will now increase the number of steps for Variational In
903889SVI_num_steps = 50000
904890```
905891
906- #### VI with a Truncated Normal Guide
892+ #### VI with a truncated Normal guide
907893
908894```{code-cell} ipython3
909895# Uniform
@@ -938,7 +924,7 @@ BayesianInferencePlot(true_theta, num_list, example_CLASS).SVI_plot(
938924)
939925```
940926
941- #### Variational Inference with a Beta Guide Distribution
927+ #### Variational inference with a Beta guide distribution
942928
943929```{code-cell} ipython3
944930# Uniform
@@ -952,7 +938,7 @@ BayesianInferencePlot(true_theta, num_list, example_CLASS).SVI_plot(
952938```
953939
954940```{code-cell} ipython3
955- # Log Normal
941+ # log Normal
956942example_CLASS = LOGNORMAL
957943print(
958944 f"=======INFO=======\nParameters: {example_CLASS.param}\nPrior Dist: {example_CLASS.name_dist}"
0 commit comments