update ar1_bayes

HumphreyYang · HumphreyYang · commit fb8fb92074b7 · 2025-08-22T13:30:55.000+10:00
diff --git a/lectures/ar1_bayes.md b/lectures/ar1_bayes.md
@@ -16,42 +16,27 @@ kernelspec:
 ```{include} _admonition/gpu.md
 ```
 
-```{code-cell} ipython3
-:tags: [hide-output]
-
-!pip install numpyro jax
-```
-
 In addition to what's included in base Anaconda, we need to install the following packages
 
 ```{code-cell} ipython3
 :tags: [hide-output]
 
-!pip install arviz pymc
+!pip install numpyro
 ```
 
 We'll begin with some Python imports.
 
 ```{code-cell} ipython3
-
-import arviz as az
-import pymc as pmc
 import numpyro
 from numpyro import distributions as dist
 
 import numpy as np
 import jax.numpy as jnp
 from jax import random
 import matplotlib.pyplot as plt
-
-import logging
-logging.basicConfig()
-logger = logging.getLogger('pymc')
-logger.setLevel(logging.CRITICAL)
-
 ```
 
-This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
+This lecture uses Bayesian methods offered by [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
 
 
 The model is a good laboratory for illustrating
@@ -74,15 +59,14 @@ $\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $
 The second component of the statistical model is
 
 $$
-y_0 \sim {\cal N}(\mu_0, \sigma_0^2)
+y_0 \sim {\mathcal{N}}(\mu_0, \sigma_0^2)
 $$ (eq:themodel_2)
 
 
 
 Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
 
-The model
-implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
+The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
 
 $$
 f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
@@ -115,7 +99,7 @@ Unknown parameters are $\rho, \sigma_x$.
 
 We have  independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
 
-The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
+The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain.
 
 NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
 
@@ -124,7 +108,7 @@ Thus, we explore consequences of making these alternative assumptions about the
 - A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable  $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
 
 - A second procedure  assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
-so that  $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
+so that  $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
 
 When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
 
@@ -145,26 +129,25 @@ How we select the initial value $y_0$ matters.
 To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
 
 ```{code-cell} ipython3
-
-def ar1_simulate(rho, sigma, y0, T):
+def ar1_simulate(ρ, σ, y0, T):
 
     # Allocate space and draw epsilons
     y = np.empty(T)
-    eps = np.random.normal(0.,sigma,T)
+    eps = np.random.normal(0., σ, T)
 
     # Initial condition and step forward
     y[0] = y0
     for t in range(1, T):
-        y[t] = rho*y[t-1] + eps[t]
+        y[t] = ρ * y[t-1] + eps[t]
 
     return y
 
-sigma =  1.
-rho = 0.5
+σ = 1.0
+ρ = 0.5
 T = 50
 
 np.random.seed(145353452)
-y = ar1_simulate(rho, sigma, 10, T)
+y = ar1_simulate(ρ, σ, 10, T)
 ```
 
 ```{code-cell} ipython3
@@ -176,174 +159,60 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning
 
 (Later we'll assume that $y_0$ is drawn from the stationary distribution, but not now.)
 
-First we'll use **pymc4**.
-
-## PyMC Implementation
-
-For a normal distribution in `pymc`,
-$var = 1/\tau = \sigma^{2}$.
-
-```{code-cell} ipython3
-
-AR1_model = pmc.Model()
-
-with AR1_model:
-
-    # Start with priors
-    rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
-    sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))
-
-    # Expected value of y at the next period (rho * y)
-    yhat = rho * y[:-1]
-
-    # Likelihood of the actual realization
-    y_like = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
-```
-
-[pmc.sample](https://www.pymc.io/projects/docs/en/v5.10.0/api/generated/pymc.sample.html#pymc-sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
-
-```{code-cell} ipython3
-:tag: [hide-output]
-
-with AR1_model:
-    trace = pmc.sample(50000, tune=10000, return_inferencedata=True)
-```
-
-```{code-cell} ipython3
-with AR1_model:
-    az.plot_trace(trace, figsize=(17,6))
-```
-
-Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
-
-This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
-
-The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
-
-
-Be that as it may, here is more information about the posterior.
-
-```{code-cell} ipython3
-with AR1_model:
-    summary = az.summary(trace, round_to=4)
-
-summary
-```
-
-Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
-
-This means that
-
-$$
-y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
-$$
-
-We alter the code as follows:
-
-```{code-cell} ipython3
-AR1_model_y0 = pmc.Model()
-
-with AR1_model_y0:
+## Implementation
 
-    # Start with priors
-    rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
-    sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))
-
-    # Standard deviation of ergodic y
-    y_sd = sigma / np.sqrt(1 - rho**2)
-
-    # yhat
-    yhat = rho * y[:-1]
-    y_data = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
-    y0_data = pmc.Normal('y0_obs', mu=0., sigma=y_sd, observed=y[0])
-```
-
-```{code-cell} ipython3
-:tag: [hide-output]
-
-with AR1_model_y0:
-    trace_y0 = pmc.sample(50000, tune=10000, return_inferencedata=True)
-
-# Grey vertical lines are the cases of divergence
-```
-
-```{code-cell} ipython3
-with AR1_model_y0:
-    az.plot_trace(trace_y0, figsize=(17,6))
-```
-
-```{code-cell} ipython3
-with AR1_model:
-    summary_y0 = az.summary(trace_y0, round_to=4)
-
-summary_y0
-```
-
-Please note how the posterior for $\rho$ has shifted to the right relative to when we conditioned on $y_0$ instead of assuming that $y_0$ is drawn from the stationary distribution.
-
-Think about why this happens.
-
-```{hint}
-It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
-that make observations more likely.
-```
-
-We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
-
-We'll now repeat the calculations using `numpyro`.
-
-## Numpyro Implementation
+First, we'll implement the AR(1) model conditioning on the initial value using NumPyro. The NUTS sampler is used to generate samples from the posterior distribution.
 
 ```{code-cell} ipython3
-
-
 def plot_posterior(sample):
     """
     Plot trace and histogram
     """
     # To np array
-    rhos = sample['rho']
-    sigmas = sample['sigma']
-    rhos, sigmas, = np.array(rhos), np.array(sigmas)
+    ρs = sample['ρ']
+    σs = sample['σ']
+    ρs, σs = np.array(ρs), np.array(σs)
 
     fig, axs = plt.subplots(2, 2, figsize=(17, 6))
     # Plot trace
-    axs[0, 0].plot(rhos)   # rho
-    axs[1, 0].plot(sigmas) # sigma
+    axs[0, 0].plot(ρs)   # ρ
+    axs[1, 0].plot(σs) # σ
 
     # Plot posterior
-    axs[0, 1].hist(rhos, bins=50, density=True, alpha=0.7)
+    axs[0, 1].hist(ρs, bins=50, density=True, alpha=0.7)
     axs[0, 1].set_xlim([0, 1])
-    axs[1, 1].hist(sigmas, bins=50, density=True, alpha=0.7)
+    axs[1, 1].hist(σs, bins=50, density=True, alpha=0.7)
 
-    axs[0, 0].set_title("rho")
-    axs[0, 1].set_title("rho")
-    axs[1, 0].set_title("sigma")
-    axs[1, 1].set_title("sigma")
+    # Set axis labels for clarity
+    axs[0, 0].set_ylabel("ρ")
+    axs[0, 1].set_xlabel("ρ")
+    axs[1, 0].set_ylabel("σ")
+    axs[1, 1].set_xlabel("σ")
     plt.show()
 ```
 
 ```{code-cell} ipython3
 def AR1_model(data):
     # set prior
-    rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
-    sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
+    ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
+    σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
 
-    # Expected value of y at the next period (rho * y)
-    yhat = rho * data[:-1]
+    # Expected value of y at the next period (ρ * y)
+    yhat = ρ * data[:-1]
 
     # Likelihood of the actual realization.
-    y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
+    y_data = numpyro.sample('y_obs', 
+                dist.Normal(loc=yhat, scale=σ), obs=data[1:])
 
 ```
 
 ```{code-cell} ipython3
-:tag: [hide-output]
+:tags: [hide-output]
 
 # Make jnp array
 y = jnp.array(y)
 
-# Set NUTS kernal
+# Set NUTS kernel
 NUTS_kernel = numpyro.infer.NUTS(AR1_model)
 
 # Run MCMC
@@ -355,11 +224,21 @@ mcmc.run(rng_key=random.PRNGKey(1), data=y)
 plot_posterior(mcmc.get_samples())
 ```
 
+Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
+
+This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
+
+The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
+
+Be that as it may, here is more information about the posterior.
+
 ```{code-cell} ipython3
 mcmc.print_summary()
 ```
 
-Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
+Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
+
+This means that
 
 $$
 y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
@@ -370,27 +249,29 @@ Here's the new code to achieve this.
 ```{code-cell} ipython3
 def AR1_model_y0(data):
     # Set prior
-    rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
-    sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
+    ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
+    σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
 
     # Standard deviation of ergodic y
-    y_sd = sigma / jnp.sqrt(1 - rho**2)
+    y_sd = σ / jnp.sqrt(1 - ρ**2)
 
-    # Expected value of y at the next period (rho * y)
-    yhat = rho * data[:-1]
+    # Expected value of y at the next period (ρ * y)
+    yhat = ρ * data[:-1]
 
     # Likelihood of the actual realization.
-    y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
-    y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])
+    y_data = numpyro.sample('y_obs', 
+                    dist.Normal(loc=yhat, scale=σ), obs=data[1:])
+    y0_data = numpyro.sample('y0_obs', 
+                    dist.Normal(loc=0., scale=y_sd), obs=data[0])
 ```
 
 ```{code-cell} ipython3
-:tag: [hide-output]
+:tags: [hide-output]
 
 # Make jnp array
 y = jnp.array(y)
 
-# Set NUTS kernal
+# Set NUTS kernel
 NUTS_kernel = numpyro.infer.NUTS(AR1_model_y0)
 
 # Run MCMC
@@ -406,6 +287,15 @@ plot_posterior(mcmc2.get_samples())
 mcmc2.print_summary()
 ```
 
+Please note how the posterior for $\rho$ has shifted to the right relative to when we conditioned on $y_0$ instead of assuming that $y_0$ is drawn from the stationary distribution.
+
+Think about why this happens.
+
+```{hint}
+It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
+that make observations more likely.
+```
+
 Look what happened to the posterior!
 
 It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)