glmm-course/15.5-chopsticks.Rmd at master · seananderson/glmm-course · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
# A mixed-effects model for chopstick efficiency

# Goals:

- Practice fitting, interpreting, criticizing, and plotting the output from mixed-effect models
- Practice model selection with mixed-effect models
- Practice comparing levels of factor predictor coefficients

# Data

Hsu, S.-H., and Wu, S.-P. (1991). An investigation for determining the optimum length of chopsticks. Appl. Ergon. 22, 395–400.

Let's read in the data and rescale the main predictor so that the effects are per 10cm of chopstick and the predictor is centered so that the intercept will be at the mean chopstick value (and this will help some of the more complicated models to fit).

```{r}
library(tidyverse)
d <- read_csv("data/raw/chopstick-effectiveness.csv") %>%
  mutate(Individual = as.factor(Individual))
names(d) <- tolower(names(d))
names(d) <- gsub("\\.", "_", names(d))

d <- mutate(d, chopstick_length_10cm = chopstick_length / 100,
  chopstick_length_10cm = chopstick_length_10cm - mean(chopstick_length_10cm))
```

Take a moment to plot the data and wrap your head around it.

```{r}
ggplot(d, aes(chopstick_length, food_pinching_efficiency, colour = individual)) + # exercise
  geom_line() # exercise

ggplot(d, aes(chopstick_length_10cm, food_pinching_efficiency)) + # exercise
  geom_line() + # exercise
  facet_wrap(~individual) # exercise
```

# Starting models

Given what we know about the experiment and the process generating the response data, what would be a reasonable form of a model to choose? A GLM? A linear regression? A GLMM?

Let's start with a linear model with a quadratic predictor:

```{r}
m1 <- lm(food_pinching_efficiency ~ poly(chopstick_length_10cm, 2),
  data = d)
arm::display(m1)
d$resids <- residuals(m1)
ggplot(d, aes(chopstick_length_10cm, resids)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_line(position = position_jitter(width = 0.02)) +
  facet_wrap(~individual)
```

What do you notice about those residuals? How can we deal with that?

Let's fit a model with a random intercept for individual.

Before we do that, let's try fitting individual models for each individual so we have a feeling what to expect:

```{r}
ggplot(d, aes(chopstick_length_10cm, food_pinching_efficiency)) +
  geom_line() +
  facet_wrap(~individual) +
  geom_smooth(method = "lm", formula = y ~ poly(x, 2), se = FALSE)
```

For now we will fit the models with maximum likelihood (as opposed to `REML`) so that we can compare models with different fixed effect structures with AIC.

```{r}
library(lme4)
m2 <- lmer(food_pinching_efficiency ~ poly(chopstick_length_10cm, 2) +
    (1 | individual), data = d, REML = FALSE) # exercise
arm::display(m2)
```

Let's look at the residuals again:

```{r}
d$resids <- residuals(m2)
ggplot(d, aes(chopstick_length_10cm, resids)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_line(position = position_jitter(width = 0.02)) +
  facet_wrap(~individual)

ggplot(d, aes(chopstick_length_10cm, resids)) +
  geom_hline(yintercept = 0, lty = 2) +
  geom_point(position = position_jitter(width = 0.02)) +
  geom_smooth(se = FALSE, method = 'loess')
```

The first residual plots look considerably better.

Did you notice anything in the last residual plot?

# Considering random slopes

We could consider different random effect structures. Here we can let the slope and quadratic effect also vary by individual. Following, Zuur's books, we can compare these with AIC if the fixed effects are the same and we set `REML = TRUE`:

```{r}
m2.1 <- lmer(food_pinching_efficiency ~ poly(chopstick_length_10cm, 2) +
    (1 | individual), data = d, REML = TRUE)

m2.2 <- lmer(food_pinching_efficiency ~ poly(chopstick_length_10cm, 2) +
    (1 + poly(chopstick_length_10cm, 2) | individual), # exercise
  data = d, REML = TRUE)

bbmle::AICctab(m2.1, m2.2)
```

In the above table, the `df` column represents (one definition of) the number of parameters. We only added 2 random slopes. Why are there 5 extra parameters in the calculation? So, according to this procedure, we should prefer the model with only the random intercept.

Although I won't show it here, the random slopes don't substantially improve the appearance of the residuals either.

# Plotting the model predictions

Let's show the model predictions across a range of chopstick lengths. We will make one set of predictions at the overall "population" level and another set of predictions at the levels of the random intercepts (the level of the individual).

```{r}
nd <- tibble(chopstick_length_10cm = seq(min(d$chopstick_length_10cm),
  max(d$chopstick_length_10cm), length.out = 100))
nd$p <- predict(m2, newdata = nd, re.form = NA)

nd_re <- expand.grid(chopstick_length_10cm = seq(min(d$chopstick_length_10cm),
  max(d$chopstick_length_10cm), length.out = 100),
  individual = unique(d$individual))
nd_re$p <- predict(m2, newdata = nd_re)

ggplot(nd_re, aes(chopstick_length_10cm, p, group = individual)) +
  geom_line(alpha = 0.3) +
  geom_point(data = d, aes(y = food_pinching_efficiency)) +
  geom_line(data = nd, aes(chopstick_length_10cm, p),
    colour = "red", lwd = 1, inherit.aes = FALSE)
```

# Comparing alternative (fixed-effect) models with AIC

We might consider whether a 3rd-order polynomial would provide a better fit:

```{r}
m3 <- lmer(food_pinching_efficiency ~
    poly(chopstick_length_10cm, 3) + # exercise
    (1 | individual), data = d, REML = FALSE)
arm::display(m3)
bbmle::AICctab(m2, m3)
```

What does the AIC table tell us?

The last residual plot didn't look ideal. How are some ways we might go about fixing that?

One method would be to treat the different lengths of sticks as independent levels or factors. <!-- exercise -->

```{r}
d$chp_fct <- as.factor(d$chopstick_length)
m4 <- lmer(food_pinching_efficiency ~ 0 + chp_fct +
    (1 | individual), data = d, REML = FALSE)
```

Now how do the residuals look?

```{r}
d$resids <- residuals(m4)
ggplot(d, aes(chopstick_length_10cm, resids)) + geom_point() +
  geom_smooth(se = FALSE, method = 'loess')
```

And what does AIC tell us when comparing these models?

```{r}
bbmle::AICctab(m2, m3, m4)
```

```{r}
arm::display(m4)
```

If we wanted to use this model for final inference, we should refit it with `REML = TRUE`:

```{r}
m4_reml <- lmer(food_pinching_efficiency ~ 0 + chp_fct + (1 | individual),
  data = d, REML = TRUE) # exercise
arm::display(m4)
arm::display(m4_reml)
```

What do you notice changes the most between the 2 models?

# Interpreting the factor coefficients

What do the `chp_fct` coefficients represent in the model?

How would we go about comparing the coefficients between 2 levels? Can we simply check whether their confidence intervals overlap? Why or why not?

This is a great reference:

Schielzeth, H. (2010). Simple means to improve the interpretability of regression coefficients. Methods Ecol. Evol. 1, 103–113. <https://doi.org/10.1111/j.2041-210X.2010.00012.x>

We can (1) relevel our factor and refit, (2) do that manually with the variance-covariance matrix, or (3) have the emmeans package do that for us with the pairs or constrast function.

E.g. <https://gist.github.com/seananderson/9ec42a6ee9fb84a61737b631dd57e7a1>

# Addendum

How large are the effect sizes in these models? Are they meaningful?

How else could you model these data?

How would you go about determining the statistical power of this study?