You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/ar1_bayes.md
+44-28Lines changed: 44 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,12 +11,12 @@ kernelspec:
11
11
name: python3
12
12
---
13
13
14
-
# Posterior Distributions for AR(1) Parameters
14
+
# Posterior Distributions for AR(1) Parameters
15
15
16
16
```{include} _admonition/gpu.md
17
17
```
18
18
19
-
In addition to what's included in base Anaconda, we need to install the following packages
19
+
In addition to what's included in base Anaconda, we need to install the following package:
20
20
21
21
```{code-cell} ipython3
22
22
:tags: [hide-output]
@@ -35,10 +35,10 @@ from jax import random, lax
35
35
import matplotlib.pyplot as plt
36
36
```
37
37
38
-
This lecture uses Bayesian methods offered by [numpyro](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
38
+
This lecture uses Bayesian methods offered by [`numpyro`](https://num.pyro.ai/en/stable/) to make statistical inferences about two parameters of a univariate first-order autoregression.
39
39
40
40
41
-
The model is a good laboratory for illustrating
41
+
The model is a good laboratory for illustrating the
42
42
consequences of alternative ways of modeling the distribution of the initial $y_0$:
where we use $f$ to denote a generic probability density.
75
76
76
-
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
77
+
The statistical model {eq}`eq:themodel`-{eq}`eq:themodel_2` implies
77
78
78
79
$$
79
80
\begin{aligned}
@@ -86,46 +87,57 @@ We want to study how inferences about the unknown parameters $(\rho, \sigma_x)$
86
87
87
88
Below, we study two widely used alternative assumptions:
88
89
89
-
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are **conditioning on an observed initial value**.
90
+
- $(\mu_0,\sigma_0) = (y_0, 0)$ which means that $y_0$ is drawn from the distribution ${\mathcal N}(y_0, 0)$; in effect, we are *conditioning on an observed initial value*.
90
91
91
92
- $\mu_0,\sigma_0$ are functions of $\rho, \sigma_x$ because $y_0$ is drawn from the stationary distribution implied by $\rho, \sigma_x$.
92
93
93
94
94
-
95
-
**Note:** We do **not** treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
95
+
```{note}
96
+
We do *not* treat a third possible case in which $\mu_0,\sigma_0$ are free parameters to be estimated.
97
+
```
96
98
97
99
Unknown parameters are $\rho, \sigma_x$.
98
100
99
-
We have independent **prior probability distributions** for $\rho, \sigma_x$ and want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
101
+
We have independent **prior probability distributions** for $\rho, \sigma_x$.
102
+
103
+
We want to compute a posterior probability distribution after observing a sample $\{y_{t}\}_{t=0}^T$.
104
+
105
+
The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$.
100
106
101
-
The notebook uses `numpyro` to compute a posterior distribution of $\rho, \sigma_x$. We will use NUTS samplers to generate samples from the posterior in a chain.
107
+
We will use NUTS samplers to generate samples from the posterior in a chain.
102
108
103
-
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behaviour and allows for convergence to a target distribution more quickly. This not only has the advantage of speed, but allows for complex models to be fitted without having to employ specialised knowledge regarding the theory underlying those fitting methods.
109
+
NUTS is a form of Monte Carlo Markov Chain (MCMC) algorithm that bypasses random walk behavior and allows for faster convergence to a target distribution.
110
+
111
+
This not only has the advantage of speed, but also allows complex models to be fitted without having to employ specialized knowledge regarding the theory underlying those fitting methods.
104
112
105
113
Thus, we explore consequences of making these alternative assumptions about the distribution of $y_0$:
106
114
107
-
- A first procedure is to condition on whatever value of $y_0$ is observed. This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
115
+
* A first procedure is to condition on whatever value of $y_0$ is observed.
116
+
117
+
- This amounts to assuming that the probability distribution of the random variable $y_0$ is a Dirac delta function that puts probability one on the observed value of $y_0$.
108
118
109
-
- A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
110
-
so that $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
119
+
* A second procedure assumes that $y_0$ is drawn from the stationary distribution of a process described by {eq}`eq:themodel`
120
+
so that $y_0 \sim {\mathcal{N}} \left(0, \frac{\sigma_x^2}{(1-\rho)^2} \right)$
111
121
112
-
When the initial value $y_0$ is far out in a tail of the stationary distribution, conditioning on an initial value gives a posterior that is **more accurate** in a sense that we'll explain.
122
+
When the initial value $y_0$ is far out in the tail of the stationary distribution, conditioning on an initial value gives a posterior that is *more accurate* in a sense that we'll explain.
113
123
114
-
Basically, when $y_0$ happens to be in a tail of the stationary distribution and we **don't condition on $y_0$**, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x$ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
124
+
Basically, when $y_0$ happens to be in the tail of the stationary distribution and we *don't condition on $y_0$*, the likelihood function for $\{y_t\}_{t=0}^T$ adjusts the posterior distribution of the parameter pair $\rho, \sigma_x$ to make the observed value of $y_0$ more likely than it really is under the stationary distribution, thereby adversely twisting the posterior in short samples.
115
125
116
126
An example below shows how not conditioning on $y_0$ adversely shifts the posterior probability distribution of $\rho$ toward larger values.
117
127
118
128
119
-
We begin by solving a **direct problem** that simulates an AR(1) process.
129
+
We begin by solving a *direct problem* that simulates an AR(1) process.
120
130
121
-
How we select the initial value $y_0$ matters.
131
+
How we select the initial value $y_0$ matters:
122
132
123
-
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$. Why? Because $y_0$ contains information about $\rho, \sigma_x$.
133
+
* If we think $y_0$ is drawn from the stationary distribution ${\mathcal N}(0, \frac{\sigma_x^{2}}{1-\rho^2})$, then it is a good idea to use this distribution as $f(y_0)$.
134
+
135
+
- Why? Because $y_0$ contains information about $\rho, \sigma_x$.
124
136
125
-
* If we suspect that $y_0$ is far in the tails of the stationary distribution -- so that variation in early observations in the sample have a significant **transient component** -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
137
+
* If we suspect that $y_0$ is far in the tail of the stationary distribution -- so that variation in early observations in the sample has a significant *transient component* -- it is better to condition on $y_0$ by setting $f(y_0) = 1$.
126
138
127
139
128
-
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in a tail of the stationary distribution.
140
+
To illustrate the issue, we'll begin by choosing an initial $y_0$ that is far out in the tail of the stationary distribution.
129
141
130
142
```{code-cell} ipython3
131
143
def ar1_simulate(ρ, σ, y0, T, key):
@@ -158,7 +170,9 @@ Now we shall use Bayes' law to construct a posterior distribution, conditioning
158
170
159
171
## Implementation
160
172
161
-
First, we'll implement the AR(1) model conditioning on the initial value using NumPyro. The NUTS sampler is used to generate samples from the posterior distribution.
173
+
First, we'll implement the AR(1) model conditioning on the initial value using `numpyro`.
174
+
175
+
The NUTS sampler is used to generate samples from the posterior distribution
@@ -295,7 +311,7 @@ that make observations more likely.
295
311
Look what happened to the posterior!
296
312
297
313
It has moved far from the true values of the parameters used to generate the data because of how Bayes' Law (i.e., conditional probability)
298
-
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
314
+
is telling `numpyro` to explain what it interprets as "explosive" observations early in the sample.
299
315
300
316
Bayes' Law is able to generate a plausible likelihood for the first observation by driving $\rho \rightarrow 1$ and $\sigma \uparrow$ in order to raise the variance of the stationary distribution.
0 commit comments