@@ -6,16 +6,18 @@ jupytext:
6
6
format_version : 0.13
7
7
jupytext_version : 1.13.7
8
8
kernelspec :
9
- display_name : Python 3
9
+ display_name : Python 3 (ipykernel)
10
10
language : python
11
11
name : python3
12
12
---
13
13
14
+ (GLM-rolling-regression)=
14
15
# Rolling Regression
15
16
16
- :::{post} Sept 15, 2021
17
- :tags: generalized linear model, pymc3.Exponential, pymc3.GaussianRandomWalk, pymc3.HalfCauchy, pymc3.HalfNormal, pymc3.Model, pymc3.Normal, regression
17
+ :::{post} June, 2022
18
+ :tags: generalized linear model, regression
18
19
:category: intermediate
20
+ :author: Thomas Wiecki, Benjamin T. Vincent
19
21
:::
20
22
21
23
+++
@@ -25,28 +27,25 @@ kernelspec:
25
27
* One common example is the price of gold (GLD) and the price of gold mining operations (GFI).
26
28
27
29
``` {code-cell} ipython3
28
- %matplotlib inline
29
-
30
30
import os
31
+ import warnings
31
32
32
33
import arviz as az
33
- import bambi as bmb
34
34
import matplotlib.pyplot as plt
35
35
import matplotlib.ticker as mticker
36
36
import numpy as np
37
37
import pandas as pd
38
- import pymc3 as pm
38
+ import pymc as pm
39
39
import xarray as xr
40
40
41
- from matplotlib import cm
41
+ from matplotlib import MatplotlibDeprecationWarning
42
42
43
- print(f"Running on PyMC3 v{pm.__version__}" )
43
+ warnings.filterwarnings(action="ignore", category=MatplotlibDeprecationWarning )
44
44
```
45
45
46
46
``` {code-cell} ipython3
47
47
RANDOM_SEED = 8927
48
48
rng = np.random.default_rng(RANDOM_SEED)
49
-
50
49
%config InlineBackend.figure_format = 'retina'
51
50
az.style.use("arviz-darkgrid")
52
51
```
@@ -84,19 +83,19 @@ cb.ax.set_yticklabels(ticklabels);
84
83
A naive approach would be to estimate a linear model and ignore the time domain.
85
84
86
85
``` {code-cell} ipython3
87
- with pm.Model() as model: # model specifications in PyMC3 are wrapped in a with-statement
86
+ with pm.Model() as model: # model specifications in PyMC are wrapped in a with-statement
88
87
# Define priors
89
- sigma = pm.HalfCauchy("sigma", beta=10, testval=1.0 )
88
+ sigma = pm.HalfCauchy("sigma", beta=10)
90
89
alpha = pm.Normal("alpha", mu=0, sigma=20)
91
90
beta = pm.Normal("beta", mu=0, sigma=20)
92
91
92
+ mu = pm.Deterministic("mu", alpha + beta * prices_zscored.GFI.to_numpy())
93
+
93
94
# Define likelihood
94
- likelihood = pm.Normal(
95
- "y", mu=alpha + beta * prices_zscored.GFI, sigma=sigma, observed=prices_zscored.GLD
96
- )
95
+ likelihood = pm.Normal("y", mu=mu, sigma=sigma, observed=prices_zscored.GLD.to_numpy())
97
96
98
97
# Inference
99
- trace_reg = pm.sample(tune=2000, return_inferencedata=True )
98
+ trace_reg = pm.sample(tune=2000)
100
99
```
101
100
102
101
The posterior predictive plot shows how bad the fit is.
@@ -110,16 +109,29 @@ ax = fig.add_subplot(
110
109
title="Posterior predictive regression lines",
111
110
)
112
111
sc = ax.scatter(prices_zscored.GFI, prices_zscored.GLD, c=colors, cmap=mymap, lw=0)
113
- pm.plot_posterior_predictive_glm(
114
- trace_reg,
115
- samples=100,
116
- label="posterior predictive regression lines",
117
- lm=lambda x, sample: sample["alpha"] + sample["beta"] * x,
118
- eval=np.linspace(prices_zscored.GFI.min(), prices_zscored.GFI.max(), 100),
112
+
113
+ xi = xr.DataArray(prices_zscored.GFI.values)
114
+ az.plot_hdi(
115
+ xi,
116
+ trace_reg.posterior.mu,
117
+ color="k",
118
+ hdi_prob=0.95,
119
+ ax=ax,
120
+ fill_kwargs={"alpha": 0.25},
121
+ smooth=False,
119
122
)
123
+ az.plot_hdi(
124
+ xi,
125
+ trace_reg.posterior.mu,
126
+ color="k",
127
+ hdi_prob=0.5,
128
+ ax=ax,
129
+ fill_kwargs={"alpha": 0.25},
130
+ smooth=False,
131
+ )
132
+
120
133
cb = plt.colorbar(sc, ticks=ticks)
121
- cb.ax.set_yticklabels(ticklabels)
122
- ax.legend(loc=0);
134
+ cb.ax.set_yticklabels(ticklabels);
123
135
```
124
136
125
137
## Rolling regression
@@ -148,18 +160,18 @@ Perform the regression given coefficients and data and link to the data via the
148
160
``` {code-cell} ipython3
149
161
with model_randomwalk:
150
162
# Define regression
151
- regression = alpha + beta * prices_zscored.GFI
163
+ regression = alpha + beta * prices_zscored.GFI.values
152
164
153
165
# Assume prices are Normally distributed, the mean comes from the regression.
154
166
sd = pm.HalfNormal("sd", sigma=0.1)
155
- likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=prices_zscored.GLD)
167
+ likelihood = pm.Normal("y", mu=regression, sigma=sd, observed=prices_zscored.GLD.to_numpy() )
156
168
```
157
169
158
170
Inference. Despite this being quite a complex model, NUTS handles it wells.
159
171
160
172
``` {code-cell} ipython3
161
173
with model_randomwalk:
162
- trace_rw = pm.sample(tune=2000, cores=4, target_accept=0.9, return_inferencedata=True )
174
+ trace_rw = pm.sample(tune=2000, target_accept=0.9)
163
175
```
164
176
165
177
Increasing the tree-depth does indeed help but it makes sampling very slow. The results look identical with this run, however.
@@ -231,13 +243,16 @@ cb = plt.colorbar(sc, ticks=ticks)
231
243
cb.ax.set_yticklabels(ticklabels);
232
244
```
233
245
234
- Author: Thomas Wiecki
246
+ ## Authors
247
+
248
+ - Created by [ Thomas Wiecki] ( https://github.com/twiecki/ )
249
+ - Updated by [ Benjamin T. Vincent] ( https://github.com/drbenvincent ) June 2022
235
250
236
251
+++
237
252
238
253
## Watermark
239
254
240
255
``` {code-cell} ipython3
241
256
%load_ext watermark
242
- %watermark -n -u -v -iv -w -p theano
257
+ %watermark -n -u -v -iv -w -p aesara,aeppl,xarray
243
258
```
0 commit comments