meta-analysis_workshop/metaregression.qmd at main · daniel1noble/meta-analysis_workshop · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Multi-level Meta-regression

```{r}
#| label: setup-packages-metareg
#| echo: false
#| message: false
#| warning: false

# Load packages
pacman::p_load(metafor, flextable, tidyverse, orchaRd, pander, mathjaxr, equatags, magick)
options(digits=3)
```

## **Introduction to Multi-level Meta-Regression**

Meta-analysis often reveals substantial heterogeneity among effect estimates included in the analysis [@Senior2016]. While many meta-analysts are interested in 'overall mean effects' we need to couch these effects in context by reporting on their variability and work hard to understand what factors drive effect variability and why [@Noble2022; @Lag2010]. For meta-analyses in comparative physiology, explaining variation in effects should probably be the main goal of every analysis. Sampling variance often explains only a little amount of the total heterogeneity, so it's important that we think hard, *a priori*, about what factors are likely to explain effect size variation.

As already indicated, some variation in effects could be explained by [nuisance heterogeneity](nuis-het.qmd), which may or may not be of interest. Other sources of variability could be methodological [@Noble2022]. Do different chemicals that result in oxidative stress induce stronger or weaker responses? Understanding how methodological choices impact upon effect sizes can change how research is done in a field.

While methodological moderators are important biological variables are usually our main interest. We might be interested in understanding how much variation among species exist or how different life-histories, body sizes, ecological niches, feeding habits or thermal strategies (e.g., ectotherms vs endotherms) differ in their response to some treatment or the association between two variables.

Exploring population-level / average changes in effect size magnitude (and direction) as a function of 'moderators' (what are typically called predictors or fixed effects in other literature) is a critical aspects of meta-analysis in ecology and evolution. The models used to test how average effects change as some function of a moderator is called meta-regression.

## **Extending our Multi-level Meta-analytic Model to a Multi-level Meta-regression Model**

We can extend our multi-level meta-analytic model, which only has an intercept, to include moderators quite easily:

$$
\begin{aligned}
y_{i} &= \mu + \sum_{i = 1}^{N_{m}}\beta_{m}x_{m} +  s_{j[i]} + spp_{k[i]} + e_{i} + m_{i} \\
m_{i} &\sim N(0, v_{i}) \\
s_{j} &\sim N(0, \tau^2) \\
s_{k} &\sim N(0, \sigma_{k}^2) \\
e_{i} &\sim N(0, \sigma_{e}^2)
\end{aligned}
$$

Here, $\beta_{m}$ is the slope (or contrast) for the *m th* moderator (where m = 1, ..., $N_{moderator}$ or $N_{m}$). We can include different types of continuous and categorical moderators in our model. Some of these moderators might control for [nuisance hetereogenity](nuis-het.qmd), others might be to understand how different experimental designs affect the magnitude and direction of mean effect size and some may simply be biological in nature -- they might compare ectotherms to endotherms, for example.

## **Phylogenetic Multi-level Meta-regression**

### Introduction
![](./figs/WuSeebacher.png)

We'll turn again to the study by @WuSeebacher2022.  We can run the same phylogenetic meta-analysis that we did in the [Phylogenetic meta-analytic models tutorial](phylo.qmd), but this time we can include moderators or predictors that we think will explain some variation in effects. From the original study aim, the first thing @WuSeebacher2022 were interested in is the relationship of physiology with activity, exploration and dispersal (Zr). In addition to the random effects  --  `es_ID` (effect size identity), `study_ID`(unique study identity), `species`(unique species identity), and `species_OTL`(phylogenetic relatedness), they were also interested in comparing whether the correlation (or Zr) differences depending on whether the behavioural trait of interest was activity, exploration or dispersal. As such, a phylogenetic meta-regression model was fit with a moderator `disp_trait` (activity, exploration, or dispersal) to explicitly test whether Zr varied across these effect types.

### Load data and phylogeny

We'll start again by loading the raw data and reconstructing the phylogenetic tree, which will be used to control for shared evolutionary history. The other random effects are already in the dataset.

```{r}
#| label: raw-data
#| message: false
#| warning: false
#| output: false

# Load packages
pacman::p_load(rotl, ape, tidyverse, devtools, R.rsp)

# Download the orchaRd package
devtools::install_github("daniel1noble/orchaRd", force = TRUE, build_vignettes = TRUE)
library(orchaRd)

zr_data <- read.csv("https://raw.githubusercontent.com/daniel1noble/meta-workshop/gh-pages/data/ind_disp_raw_data.csv") %>%
  dplyr::select(-dispersal_def) %>% # remove irrelevant columns
  tibble::rowid_to_column("es_ID") %>% # add effect size ID
  dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, "_", " ")) # remove underscore for some species with missing underscore. Some inconsistency when curating the database.

# Calculate Zr and sampling variance
zr_data <- metafor::escalc(measure = "ZCOR", ri = corr_coeff, ni = sample_size, data = zr_data, var.names = c("Zr", "V_Zr"))

# Create species list based on OTL names and match with OTL database
species <- sort(unique(as.character(zr_data$species_OTL))) # generate list of species (as character format)
taxa    <- rotl::tnrs_match_names(names = species) # match taxonomic names to the OTL

# Retrieving phylogenetic relationships among taxa in the form of a trimmed sub-tree
tree <- rotl::tol_induced_subtree(ott_ids = ott_id(taxa), label_format = "name")

set.seed(1)
tree <- ape::compute.brlen(tree, method = "Grafen", power = 1) # compute branch lengths
tree <- ape::multi2di(tree, random = TRUE) # use a randomization approach to deal with polytomies

# Create correlation matrix for analysis
phylo_cor <- vcv(tree, cor = T)
```

### Fit the Phylogenetic Meta-regression Model

We can now fit our multi-level meta-regression model including the `disp_trait` moderator:

**Note**: Remember to add an underscore to the species OTL because the tree tip names have underscores and these need to match exactly.

```{r}
#| label: analysis
#| echo: true

overall_model <- metafor::rma.mv(yi = Zr, V = V_Zr,
                                        mod    = ~ -1 + disp_trait,
                                        random = list(~1 | study_ID, ~1 | es_ID, ~1 | species, ~1 | species_OTL),
                                        R      = list(species_OTL = phylo_cor),
                                        method = "REML", test = "t", dfs = "contain",
                                        data   = zr_data %>%
                                   dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, " ", "_")))
summary(overall_model)
```

#### **What does the model output tell us?**

::: {.panel-tabset}

## Task

>**Is there a relationship between physiology with either activity, exploration, and dispersal?**

## Answer

>We can see from above that the model estimates **a positive relationship** between physiology and activity with **an overall mean of `r overall_model$b[1,]` with a 95% confidence interval (95% CI) of `r overall_model$ci.lb[1]` to `r overall_model$ci.ub[1]`**. Exploration and dispersal was also, on average, positively associated with physiology, but exploration had higher level of uncertainty (Exploration = `r overall_model$b[3,]` [95% CI `r overall_model$ci.lb[3]`, `r overall_model$ci.ub[3]`], Dispersal = `r overall_model$b[2,]`, [95% CI `r overall_model$ci.lb[2]`, `r overall_model$ci.ub[2]`]). Heterogeneity remained high in this analysis based on the Q-test.

:::

<br>

### Contrasts between Trait Levels

You might have noticed that we suppressed the intercept in this model (i.e., `-1`). This is very typical in meta-analysis because we are interested in the overall mean effect size in each of the three levels of the moderator. However, you may instead want to explicitly compare whether there are significant differences in *Zr* between categories. To estimate these contrasts you can add the intercept back into the model and the resulting parameters are now contrasts between the intercept (in this case `Activity`) and `Dispersal` and `Exploration`.

```{r}
#| label: analysis2
#| echo: true

overall_model <- metafor::rma.mv(yi = Zr, V = V_Zr,
                                        mod    = ~ disp_trait,
                                        random = list(~1 | study_ID, ~1 | es_ID, ~1 | species, ~1 | species_OTL),
                                        R      = list(species_OTL = phylo_cor),
                                        method = "REML", test = "t", dfs = "contain",
                                        data   = zr_data %>%
                                   dplyr::mutate(species_OTL = stringr::str_replace_all(species_OTL, " ", "_")))
summary(overall_model)
```

#### **What does the model output tell us?**

We can see that the model coefficients have changed. The `intrcpt` now takes on the mean *Zr* for `Activity`. In other words, this is the mean *Zr* for the correlations between physiology and activity. As such, this is the same value as the previous model. For the `intrcpt`, the null hypothesis being tested is whether the mean differs significantly from zero, which it does. In contrast, `disp_traitDispersal` and `disp_traitExploration` are no longer meta-analytic means for `Dispersal` and `Exploration` as in the previous model. Rather, they are now contrasts, or differences, between the mean *Zr* for `Activity` and the mean *Zr* for `Dispersal` and `Exploration`. They are now testing the null hypothesis that the difference between *Activity and Dispersal* (i.e., `disp_traitDispersal`) and *Activity and Exploration* (i.e., `disp_traitExploration`) are different from zero.

::: {.panel-tabset}

## Task!

>**Does the mean *Zr* for Exploration and Dispersal differ from Activity?**

## Answer!

>We can see that the mean Z-transformed correlation (*Zr*) between physiology and activity **do not differ significantly** from *Zr* between *Activity and Dispersal* and *Activity and Exploration*. The model also explains only `r orchaRd::r2_ml(overall_model)[1]`% ($R_{marginal}^2$) of the variation, so we can be confident that there is little predictive power from this moderator.

<br>

>We can also visually depict the differences in *r* between these groups using an orchard plot [@Nakagawa2021c] after back-transforming from *Zr*:

:::

<br>

```{r}
#| label: orchard
#| echo: true
#| fig-align: center
#| fig-cap: "Orchard plot of mean correlation coefficient between physiology and activity, dispersal and exploration. k indicates the number of effect sizes (studies in brackets). Thick black bars are 95% confidence intervals and the thin (`twigs`) are 95% prediction intervals"

orchaRd::orchard_plot(overall_model, group = "study_ID", mod = "disp_trait", xlab = "Correlation Coefficient (r)", transfm = "tanh", angle = 45)
```


<br>

## **References**

<div id="refs"></div>

<br>

## **Session Information**

```{r}
#| label: sessioninfo-metareg
#| echo: false

pander(sessionInfo(), locale = FALSE)
```