Skip to content

Commit e021c47

Browse files
Merge pull request #26 from QCBSRworkshops/acf-pacf-pres-en
Adds detailed explanation of autocorrelation, ACF and pACF to the English version of the presentation and the book.
2 parents b735c5c + 57fa090 commit e021c47

File tree

2 files changed

+72
-5
lines changed

2 files changed

+72
-5
lines changed

book-en/08-GAMMs.Rmd

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,51 @@
22

33
When observations are not independent, GAMs can be used to either incorporate:
44

5-
- a serial correlation structure to model residual autocorrelation
6-
(autoregressive: AR; moving average: MA; or a combination of the two:
7-
ARMA),
5+
- a correlation structure to model autocorrelated residuals, such as:
6+
- the autoregressive (AR) model
7+
- the moving average model (MA); or,
8+
- a combination of both models (ARMA).
89
- random effects that model independence among observations from the
910
same site.
1011

1112
That is, in addition to changing the basis as with the `nottem` example, we can also add complexity to the model by incorporating an autocorrelation structure or mixed effects using the `gamm()` function in the `mgcv` package. Although we will not be using it here, the [`gamm4`](https://cran.r-project.org/web/packages/gamm4/gamm4.pdf) package can also be used to estimate GAMMs in R.
1213

1314
## Residual autocorrelation
1415

16+
**Autocorrelation of residuals** refers to the degree of correlation between the residuals (the differences between the actual and predicted values) in a time series model.
17+
18+
In other words, if there is an autocorrelation of residuals in a time series model, it means that there is a pattern or relationship between the residuals at one point in time and the residuals at other points in time.
19+
20+
Autocorrelation of residuals is usually measured using the **ACF (autocorrelation function)** and **pACF (partial autocorrelation function)** graphs, which show the correlation between residuals at different lags.
21+
22+
#### The autocorrelation function
23+
24+
The autocorrelation function (ACF) of a stationary time series can be defined using the following equation:
25+
26+
$$ACF(k) = Corr(Y_t, Y_{t-k})$$
27+
where $Y_t$ is the value of the time series at time $t$, $Y_{t-k}$ is the value of the time series at time $t-k$, and $Corr()$ is the correlation coefficient between two random variables.
28+
29+
In other words, the ACF($k$) is the correlation between the values of the time series $Y_t$ and $Y_{t-k}$, where $k$ is the lag between the two points in time. The ACF is a measure of the strength of the correlation between each value in the time series and its lagged values at different times.
30+
31+
#### The partial autocorrelation function
32+
33+
The partial autocorrelation function (pACF) of a stationary time series can be defined using the following recursive formula:
34+
35+
$$pACF(1) = Corr(Y_1, Y_2)$$
36+
37+
$$pACF(k) = [ Corr(Y_k, Y_{k+1} - \hat{\phi}{k,1}Y{k}) ] / [ Corr(Y_1, Y_2 - \hat{\phi}_{1,1}Y_1) ]$$
38+
39+
for $k > 1$
40+
41+
where $Y_t$ is the value of the time series at time $t$, $\hat{\phi}{k,1}$, $\hat{\phi}{1,1}$, $...$ $\hat{\phi}{k-1,k-1}$ are the coefficients of the autoregressive model of order $k-1$ fitted to the time series, and $Corr()$ is the coefficient of correlation between two random variables
42+
43+
In other words, the pACF($k$) is the correlation between the values of the time series $Y_k$ and $Y_{k+j}$ after removing the influence of intermediate lags $Y_{k+1}, Y_{k+2}, ..., Y_{k+j-1}$ using an autoregressive model of order $k-1$.
44+
The pACF measures the correlation between $Y_k$ and $Y_{k+j}$ after removing the effect of any shorter intermediate lags.
45+
46+
If the **ACF** or **pACF** graphs show significant correlations at non-zero lags, there is evidence of autocorrelation in the residuals and the model may need to be modified or improved to better capture the underlying patterns in the data.
47+
48+
Let's see how this works with our `year_gam` model!
49+
1550
To start, let's have a look at a model with temporal autocorrelation in the residuals. We will revisit the Nottingham temperature model and test for correlated errors using the (partial) autocorrelation function.
1651

1752
```{r, echo = TRUE, eval = FALSE}

pres-en/workshop08-pres-en.Rmd

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1843,17 +1843,45 @@ class: inverse, center, middle
18431843

18441844
When observations are not independent, GAMs can be used to either incorporate:
18451845

1846-
- a serial correlation structure to model residual autocorrelation (autoregressive AR, moving average MA, or a combination of the two ARMA),
1847-
- random effects that model independence among observations from the same site.
1846+
- a correlation structure to model autocorrelated residuals, such as:
1847+
- the autoregressive (AR) model
1848+
- the moving average model (MA); or,
1849+
- a combination of both models (ARMA).
1850+
- random effects that model independence between observations at the same site.- random effects that model independence among observations from the same site.
18481851

18491852
---
1853+
1854+
# Autocorrelation of residuals
1855+
1856+
**Autocorrelation of residuals** refers to the degree of correlation between the residuals (the differences between actual and predicted values) in a time series model.
1857+
1858+
In other words, if there is autocorrelation of residuals in a time series model, it means that there is a pattern or relationship between the residuals at one time and the residuals at other times.
1859+
1860+
--
1861+
1862+
<br>
1863+
1864+
Autocorrelation of residuals is usually measured using the **ACF (autocorrelation function)** and **pACF (partial autocorrelation function)** graphs, which show the correlation between residuals at different lags.
1865+
1866+
If the **ACF** or **pACF** graphs show significant correlations at non-zero lags, there is evidence of autocorrelation in the residuals and the model may need to be modified or improved to better capture the underlying patterns in the data.
1867+
1868+
--
1869+
1870+
<br>
1871+
1872+
Let's see how this works with our `year_gam` model!
1873+
1874+
---
1875+
18501876
# Model with correlated errors
18511877

18521878
Let's have a look at a model with temporal autocorrelation in the residuals. We will revisit the Nottingham temperature model and test for correlated errors using the (partial) autocorrelation function.
18531879

18541880
```{r, eval = FALSE, fig.width=9, fig.height=4.5}
18551881
par(mfrow = c(1,2))
1882+
18561883
acf(resid(year_gam), lag.max = 36, main = "ACF")
1884+
18571885
pacf(resid(year_gam), lag.max = 36, main = "pACF")
18581886
```
18591887

@@ -1862,7 +1890,9 @@ pacf(resid(year_gam), lag.max = 36, main = "pACF")
18621890

18631891
```{r, echo = F, fig.width=9, fig.height=4.5}
18641892
par(mfrow = c(1,2))
1893+
18651894
acf(resid(year_gam), lag.max = 36, main = "ACF")
1895+
18661896
pacf(resid(year_gam), lag.max = 36, main = "pACF")
18671897
```
18681898

@@ -1882,6 +1912,8 @@ In contrast, the __partial autocorrelation function__ (PACF: second panel above)
18821912
gives the partial correlation of a time series with its own lagged values,
18831913
after controlling for the values of the time series at all shorter lags.
18841914

1915+
In the ACF graph, the blue shaded region represents the confidence interval and the red dashed lines represent the limits of statistical significance.
1916+
18851917
---
18861918
# Model with correlated errors
18871919

0 commit comments

Comments
 (0)