Skip to content

Commit fb8fb92

Browse files
committed
update ar1_bayes
1 parent 677d94b commit fb8fb92

File tree

1 file changed

+65
-175
lines changed

1 file changed

+65
-175
lines changed

lectures/ar1_bayes.md

Lines changed: 65 additions & 175 deletions
Original file line numberDiff line numberDiff line change
@@ -16,42 +16,27 @@ kernelspec:
1616
```{include} _admonition/gpu.md
1717
```
1818

19-
```{code-cell} ipython3
20-
:tags: [hide-output]
21-
22-
!pip install numpyro jax
23-
```
24-
2519
In addition to what's included in base Anaconda, we need to install the following packages
2620

2721
```{code-cell} ipython3
2822
:tags: [hide-output]
2923
30-
!pip install arviz pymc
24+
!pip install numpyro
3125
```
3226

3327
We'll begin with some Python imports.
3428

3529
```{code-cell} ipython3
36-
37-
import arviz as az
38-
import pymc as pmc
3930
import numpyro
4031
from numpyro import distributions as dist
4132
4233
import numpy as np
4334
import jax.numpy as jnp
4435
from jax import random
4536
import matplotlib.pyplot as plt
46-
47-
import logging
48-
logging.basicConfig()
49-
logger = logging.getLogger('pymc')
50-
logger.setLevel(logging.CRITICAL)
51-
5237
```
5338

54-
This lecture uses Bayesian methods offered by [pymc](https://www.pymc.io/projects/docs/en/stable/) and [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
39+
This lecture uses Bayesian methods offered by [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
5540

5641

5742
The model is a good laboratory for illustrating
@@ -74,15 +59,14 @@ $\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $
7459
The second component of the statistical model is
7560
7661
$$
77-
y_0 \sim {\cal N}(\mu_0, \sigma_0^2)
62+
y_0 \sim {\mathcal{N}}(\mu_0, \sigma_0^2)
7863
$$ (eq:themodel_2)
7964
8065
8166
8267
Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
8368
84-
The model
85-
implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
69+
The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
8670
8771
$$
8872
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
@@ -115,7 +99,7 @@ Unknown parameters are $\rho, \sigma_x$.
11599
116100
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
117101
118-
The notebook uses `pymc4` and `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain. Both of these libraries support NUTS samplers.
102+
The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain.
119103
120104
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
121105
@@ -124,7 +108,7 @@ Thus, we explore consequences of making these alternative assumptions about the
124108
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
125109
126110
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
127-
so that $y_0 \sim {\cal N} \left(0, {\sigma_x^2\over (1-\rho)^2} \right) $
111+
so that $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
128112
129113
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
130114
@@ -145,26 +129,25 @@ How we select the initial value $y_0$ matters.
145129
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
146130
147131
```{code-cell} ipython3
148-
149-
def ar1_simulate(rho, sigma, y0, T):
132+
def ar1_simulate(ρ, σ, y0, T):
150133
151134
# Allocate space and draw epsilons
152135
y = np.empty(T)
153-
eps = np.random.normal(0.,sigma,T)
136+
eps = np.random.normal(0., σ, T)
154137
155138
# Initial condition and step forward
156139
y[0] = y0
157140
for t in range(1, T):
158-
y[t] = rho*y[t-1] + eps[t]
141+
y[t] = ρ * y[t-1] + eps[t]
159142
160143
return y
161144
162-
sigma = 1.
163-
rho = 0.5
145+
σ = 1.0
146+
ρ = 0.5
164147
T = 50
165148
166149
np.random.seed(145353452)
167-
y = ar1_simulate(rho, sigma, 10, T)
150+
y = ar1_simulate(ρ, σ, 10, T)
168151
```
169152
170153
```{code-cell} ipython3
@@ -176,174 +159,60 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning
176159
177160
(Later we'll assume that $y_0$ is drawn from the stationary distribution, but not now.)
178161
179-
First we'll use **pymc4**.
180-
181-
## PyMC Implementation
182-
183-
For a normal distribution in `pymc`,
184-
$var = 1/\tau = \sigma^{2}$.
185-
186-
```{code-cell} ipython3
187-
188-
AR1_model = pmc.Model()
189-
190-
with AR1_model:
191-
192-
# Start with priors
193-
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
194-
sigma = pmc.HalfNormal('sigma', sigma = np.sqrt(10))
195-
196-
# Expected value of y at the next period (rho * y)
197-
yhat = rho * y[:-1]
198-
199-
# Likelihood of the actual realization
200-
y_like = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
201-
```
202-
203-
[pmc.sample](https://www.pymc.io/projects/docs/en/v5.10.0/api/generated/pymc.sample.html#pymc-sample) by default uses the NUTS samplers to generate samples as shown in the below cell:
204-
205-
```{code-cell} ipython3
206-
:tag: [hide-output]
207-
208-
with AR1_model:
209-
trace = pmc.sample(50000, tune=10000, return_inferencedata=True)
210-
```
211-
212-
```{code-cell} ipython3
213-
with AR1_model:
214-
az.plot_trace(trace, figsize=(17,6))
215-
```
216-
217-
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
218-
219-
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
220-
221-
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
222-
223-
224-
Be that as it may, here is more information about the posterior.
225-
226-
```{code-cell} ipython3
227-
with AR1_model:
228-
summary = az.summary(trace, round_to=4)
229-
230-
summary
231-
```
232-
233-
Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
234-
235-
This means that
236-
237-
$$
238-
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
239-
$$
240-
241-
We alter the code as follows:
242-
243-
```{code-cell} ipython3
244-
AR1_model_y0 = pmc.Model()
245-
246-
with AR1_model_y0:
162+
## Implementation
247163
248-
# Start with priors
249-
rho = pmc.Uniform('rho', lower=-1., upper=1.) # Assume stable rho
250-
sigma = pmc.HalfNormal('sigma', sigma=np.sqrt(10))
251-
252-
# Standard deviation of ergodic y
253-
y_sd = sigma / np.sqrt(1 - rho**2)
254-
255-
# yhat
256-
yhat = rho * y[:-1]
257-
y_data = pmc.Normal('y_obs', mu=yhat, sigma=sigma, observed=y[1:])
258-
y0_data = pmc.Normal('y0_obs', mu=0., sigma=y_sd, observed=y[0])
259-
```
260-
261-
```{code-cell} ipython3
262-
:tag: [hide-output]
263-
264-
with AR1_model_y0:
265-
trace_y0 = pmc.sample(50000, tune=10000, return_inferencedata=True)
266-
267-
# Grey vertical lines are the cases of divergence
268-
```
269-
270-
```{code-cell} ipython3
271-
with AR1_model_y0:
272-
az.plot_trace(trace_y0, figsize=(17,6))
273-
```
274-
275-
```{code-cell} ipython3
276-
with AR1_model:
277-
summary_y0 = az.summary(trace_y0, round_to=4)
278-
279-
summary_y0
280-
```
281-
282-
Please note how the posterior for $\rho$ has shifted to the right relative to when we conditioned on $y_0$ instead of assuming that $y_0$ is drawn from the stationary distribution.
283-
284-
Think about why this happens.
285-
286-
```{hint}
287-
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
288-
that make observations more likely.
289-
```
290-
291-
We'll return to this issue after we use `numpyro` to compute posteriors under our two alternative assumptions about the distribution of $y_0$.
292-
293-
We'll now repeat the calculations using `numpyro`.
294-
295-
## Numpyro Implementation
164+
First, we'll implement the AR(1) model conditioning on the initial value using NumPyro. The NUTS sampler is used to generate samples from the posterior distribution.
296165
297166
```{code-cell} ipython3
298-
299-
300167
def plot_posterior(sample):
301168
"""
302169
Plot trace and histogram
303170
"""
304171
# To np array
305-
rhos = sample['rho']
306-
sigmas = sample['sigma']
307-
rhos, sigmas, = np.array(rhos), np.array(sigmas)
172+
ρs = sample['ρ']
173+
σs = sample['σ']
174+
ρs, σs = np.array(ρs), np.array(σs)
308175
309176
fig, axs = plt.subplots(2, 2, figsize=(17, 6))
310177
# Plot trace
311-
axs[0, 0].plot(rhos) # rho
312-
axs[1, 0].plot(sigmas) # sigma
178+
axs[0, 0].plot(ρs) # ρ
179+
axs[1, 0].plot(σs) # σ
313180
314181
# Plot posterior
315-
axs[0, 1].hist(rhos, bins=50, density=True, alpha=0.7)
182+
axs[0, 1].hist(ρs, bins=50, density=True, alpha=0.7)
316183
axs[0, 1].set_xlim([0, 1])
317-
axs[1, 1].hist(sigmas, bins=50, density=True, alpha=0.7)
184+
axs[1, 1].hist(σs, bins=50, density=True, alpha=0.7)
318185
319-
axs[0, 0].set_title("rho")
320-
axs[0, 1].set_title("rho")
321-
axs[1, 0].set_title("sigma")
322-
axs[1, 1].set_title("sigma")
186+
# Set axis labels for clarity
187+
axs[0, 0].set_ylabel("ρ")
188+
axs[0, 1].set_xlabel("ρ")
189+
axs[1, 0].set_ylabel("σ")
190+
axs[1, 1].set_xlabel("σ")
323191
plt.show()
324192
```
325193
326194
```{code-cell} ipython3
327195
def AR1_model(data):
328196
# set prior
329-
rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
330-
sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
197+
ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
198+
σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
331199
332-
# Expected value of y at the next period (rho * y)
333-
yhat = rho * data[:-1]
200+
# Expected value of y at the next period (ρ * y)
201+
yhat = ρ * data[:-1]
334202
335203
# Likelihood of the actual realization.
336-
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
204+
y_data = numpyro.sample('y_obs',
205+
dist.Normal(loc=yhat, scale=σ), obs=data[1:])
337206
338207
```
339208
340209
```{code-cell} ipython3
341-
:tag: [hide-output]
210+
:tags: [hide-output]
342211
343212
# Make jnp array
344213
y = jnp.array(y)
345214
346-
# Set NUTS kernal
215+
# Set NUTS kernel
347216
NUTS_kernel = numpyro.infer.NUTS(AR1_model)
348217
349218
# Run MCMC
@@ -355,11 +224,21 @@ mcmc.run(rng_key=random.PRNGKey(1), data=y)
355224
plot_posterior(mcmc.get_samples())
356225
```
357226
227+
Evidently, the posteriors aren't centered on the true values of $.5, 1$ that we used to generate the data.
228+
229+
This is a symptom of the classic **Hurwicz bias** for first order autoregressive processes (see Leonid Hurwicz {cite}`hurwicz1950least`.)
230+
231+
The Hurwicz bias is worse the smaller is the sample (see {cite}`Orcutt_Winokur_69`).
232+
233+
Be that as it may, here is more information about the posterior.
234+
358235
```{code-cell} ipython3
359236
mcmc.print_summary()
360237
```
361238
362-
Next, we again compute the posterior under the assumption that $y_0$ is drawn from the stationary distribution, so that
239+
Now we shall compute a posterior distribution after seeing the same data but instead assuming that $y_0$ is drawn from the stationary distribution.
240+
241+
This means that
363242
364243
$$
365244
y_0 \sim N \left(0, \frac{\sigma_x^{2}}{1 - \rho^{2}} \right)
@@ -370,27 +249,29 @@ Here's the new code to achieve this.
370249
```{code-cell} ipython3
371250
def AR1_model_y0(data):
372251
# Set prior
373-
rho = numpyro.sample('rho', dist.Uniform(low=-1., high=1.))
374-
sigma = numpyro.sample('sigma', dist.HalfNormal(scale=np.sqrt(10)))
252+
ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
253+
σ = numpyro.sample('σ', dist.HalfNormal(scale=np.sqrt(10)))
375254
376255
# Standard deviation of ergodic y
377-
y_sd = sigma / jnp.sqrt(1 - rho**2)
256+
y_sd = σ / jnp.sqrt(1 - ρ**2)
378257
379-
# Expected value of y at the next period (rho * y)
380-
yhat = rho * data[:-1]
258+
# Expected value of y at the next period (ρ * y)
259+
yhat = ρ * data[:-1]
381260
382261
# Likelihood of the actual realization.
383-
y_data = numpyro.sample('y_obs', dist.Normal(loc=yhat, scale=sigma), obs=data[1:])
384-
y0_data = numpyro.sample('y0_obs', dist.Normal(loc=0., scale=y_sd), obs=data[0])
262+
y_data = numpyro.sample('y_obs',
263+
dist.Normal(loc=yhat, scale=σ), obs=data[1:])
264+
y0_data = numpyro.sample('y0_obs',
265+
dist.Normal(loc=0., scale=y_sd), obs=data[0])
385266
```
386267
387268
```{code-cell} ipython3
388-
:tag: [hide-output]
269+
:tags: [hide-output]
389270
390271
# Make jnp array
391272
y = jnp.array(y)
392273
393-
# Set NUTS kernal
274+
# Set NUTS kernel
394275
NUTS_kernel = numpyro.infer.NUTS(AR1_model_y0)
395276
396277
# Run MCMC
@@ -406,6 +287,15 @@ plot_posterior(mcmc2.get_samples())
406287
mcmc2.print_summary()
407288
```
408289
290+
Please note how the posterior for $\rho$ has shifted to the right relative to when we conditioned on $y_0$ instead of assuming that $y_0$ is drawn from the stationary distribution.
291+
292+
Think about why this happens.
293+
294+
```{hint}
295+
It is connected to how Bayes Law (conditional probability) solves an **inverse problem** by putting high probability on parameter values
296+
that make observations more likely.
297+
```
298+
409299
Look what happened to the posterior!
410300
411301
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)

0 commit comments

Comments
 (0)