Skip to content

Commit 0a55356

Browse files
committed
update for rebuild
1 parent 547121a commit 0a55356

File tree

1 file changed

+44
-28
lines changed

1 file changed

+44
-28
lines changed

lectures/ar1_bayes.md

Lines changed: 44 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,12 @@ kernelspec:
1111
name: python3
1212
---
1313

14-
# Posterior Distributions for AR(1) Parameters
14+
# Posterior Distributions for AR(1) Parameters
1515

1616
```{include} _admonition/gpu.md
1717
```
1818

19-
In addition to what's included in base Anaconda, we need to install the following packages
19+
In addition to what's included in base Anaconda, we need to install the following package:
2020

2121
```{code-cell} ipython3
2222
:tags: [hide-output]
@@ -35,10 +35,10 @@ from jax import random, lax
3535
import matplotlib.pyplot as plt
3636
```
3737

38-
This lecture uses Bayesian methods offered by [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
38+
This lecture uses Bayesian methods offered by [`numpyro`](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
3939

4040

41-
The model is a good laboratory for illustrating
41+
The model is a good laboratory for illustrating the
4242
consequences of alternative ways of modeling the distribution of the initial $y_0$:
4343

4444
- As a fixed number
@@ -52,7 +52,8 @@ $$
5252
y_{t+1} = \rho y_t + \sigma_x \epsilon_{t+1}, \quad t \geq 0
5353
$$ (eq:themodel)
5454
55-
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$;
55+
where the scalars $\rho$ and $\sigma_x$ satisfy $|\rho| < 1$ and $\sigma_x > 0$.
56+
5657
$\{\epsilon_{t+1}\}$ is a sequence of i.i.d. normal random variables with mean $0$ and variance $1$.
5758
5859
The second component of the statistical model is
@@ -65,15 +66,15 @@ $$ (eq:themodel_2)
6566
6667
Consider a sample $\{y_t\}_{t=0}^T$ governed by this statistical model.
6768
68-
The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be **factored**:
69+
The model implies that the likelihood function of $\{y_t\}_{t=0}^T$ can be *factored*:
6970
7071
$$
7172
f(y_T, y_{T-1}, \ldots, y_0) = f(y_T| y_{T-1}) f(y_{T-1}| y_{T-2}) \cdots f(y_1 | y_0 ) f(y_0)
7273
$$
7374
7475
where we use $f$ to denote a generic probability density.
7576
76-
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
77+
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
7778
7879
$$
7980
\begin{aligned}
@@ -86,46 +87,57 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$
8687
8788
Below, we study two widely used alternative assumptions:
8889
89-
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
90+
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are *conditioning on an observed initial value*.
9091
9192
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
9293
9394
94-
95-
**Note:** We do **not** treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
95+
```{note}
96+
We do *not* treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
97+
```
9698
9799
Unknown parameters are $\rho, \sigma_x$.
98100
99-
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
101+
We have independent **prior probability distributions** for $\rho, \sigma_x$.
102+
103+
We want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
104+
105+
The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
100106
101-
The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain.
107+
We will use NUTS samplers to generate samples from the posterior in a chain.
102108
103-
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
109+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behavior and allows for faster convergence to a target distribution.
110+
111+
This not only has the advantage of speed, but also allows complex models to be fitted without having to employ specialized knowledge regarding the theory underlying those fitting methods.
104112
105113
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
106114
107-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
115+
* A first procedure is to condition on whatever value of $y_0$ is observed.
116+
117+
- This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
108118
109-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
110-
so that $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
119+
* A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
120+
so that $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
111121
112-
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
122+
When the initial value $y_0$ is far out in the tail of the stationary distribution, conditioning on an initial value gives a posterior that is *more accurate* in a sense that we'll explain.
113123
114-
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x $ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
124+
Basically, when $y_0$ happens to be in the tail of the stationary distribution and we *don't condition on $y_0$*, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x$ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
115125
116126
An example below shows how not conditioning on $y_0$ adversely shifts the posterior probability distribution of $\rho$ toward larger values.
117127
118128
119-
We begin by solving a **direct problem** that simulates an AR(1) process.
129+
We begin by solving a *direct problem* that simulates an AR(1) process.
120130
121-
How we select the initial value $y_0$ matters.
131+
How we select the initial value $y_0$ matters:
122132
123-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
133+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$.
134+
135+
- Why? Because $y_0$ contains information about $\rho, \sigma_x$.
124136
125-
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
137+
* If we suspect that $y_0$ is far in the tail of the stationary distribution -- so that variation in early observations in the sample has a significant *transient component* -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
126138
127139
128-
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
140+
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in the tail of the stationary distribution.
129141
130142
```{code-cell} ipython3
131143
def ar1_simulate(ρ, σ, y0, T, key):
@@ -158,7 +170,9 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning
158170
159171
## Implementation
160172
161-
First, we'll implement the AR(1) model conditioning on the initial value using NumPyro. The NUTS sampler is used to generate samples from the posterior distribution.
173+
First, we'll implement the AR(1) model conditioning on the initial value using `numpyro`.
174+
175+
The NUTS sampler is used to generate samples from the posterior distribution
162176
163177
```{code-cell} ipython3
164178
def plot_posterior(sample):
@@ -184,16 +198,18 @@ def plot_posterior(sample):
184198
axs[0, 1].set_xlabel("ρ")
185199
axs[1, 0].set_ylabel("σ")
186200
axs[1, 1].set_xlabel("σ")
201+
202+
plt.tight_layout()
187203
plt.show()
188204
```
189205
190206
```{code-cell} ipython3
191207
def AR1_model(data):
192-
# set prior
208+
# Set prior
193209
ρ = numpyro.sample('ρ', dist.Uniform(low=-1., high=1.))
194210
σ = numpyro.sample('σ', dist.HalfNormal(scale=jnp.sqrt(10)))
195211
196-
# Expected value of y at the next period (ρ * y)
212+
# Expected value of y in the next period (ρ * y)
197213
yhat = ρ * data[:-1]
198214
199215
# Likelihood of the actual realization.
@@ -251,7 +267,7 @@ def AR1_model_y0(data):
251267
# Standard deviation of ergodic y
252268
y_sd = σ / jnp.sqrt(1 - ρ**2)
253269
254-
# Expected value of y at the next period (ρ * y)
270+
# Expected value of y in the next period (ρ * y)
255271
yhat = ρ * data[:-1]
256272
257273
# Likelihood of the actual realization.
@@ -295,7 +311,7 @@ that make observations more likely.
295311
Look what happened to the posterior!
296312
297313
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)
298-
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
314+
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
299315
300316
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
301317

0 commit comments

Comments
 (0)