Skip to content

Commit db69625

Browse files
Merge branch 'master' into code-examples
2 parents b6cce66 + da93969 commit db69625

20 files changed

+2598
-219
lines changed
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
---
2+
title: "Causal Inference with `group_by()` and `summarize()`"
3+
format: html
4+
---
5+
6+
```{r}
7+
#| label: setup
8+
library(tidyverse)
9+
set.seed(1)
10+
```
11+
12+
## Your Turn 1
13+
14+
Run this code to generate the simulated data set
15+
16+
```{r}
17+
n <- 1000
18+
sim <- tibble(
19+
confounder = rbinom(n, 1, 0.5),
20+
p_exposure = case_when(
21+
confounder == 1 ~ 0.75,
22+
confounder == 0 ~ 0.25
23+
),
24+
exposure = rbinom(n, 1, p_exposure),
25+
outcome = confounder + rnorm(n)
26+
)
27+
```
28+
29+
1. Group the dataset by `confounder` and `exposure`
30+
2. Calculate the mean of the `outcome` for the groups
31+
32+
```{r}
33+
sim |>
34+
group_by(______, ______) |>
35+
summarise(avg_y = mean(______)) |>
36+
# pivot the data so we can get the difference
37+
# between the exposure groups
38+
pivot_wider(
39+
names_from = exposure,
40+
values_from = avg_y,
41+
names_prefix = "x_"
42+
) |>
43+
summarise(estimate = x_1 - x_0) |>
44+
summarise(estimate = mean(estimate)) # note, we would need to weight this if the confounder groups were not equal sized
45+
```
46+
47+
## Your Turn 2
48+
49+
Run the following code to generate `sim2`
50+
51+
```{r}
52+
n <- 1000
53+
sim2 <- tibble(
54+
confounder_1 = rbinom(n, 1, 0.5),
55+
confounder_2 = rbinom(n, 1, 0.5),
56+
57+
p_exposure = case_when(
58+
confounder_1 == 1 & confounder_2 == 1 ~ 0.75,
59+
confounder_1 == 0 & confounder_2 == 1 ~ 0.9,
60+
confounder_1 == 1 & confounder_2 == 0 ~ 0.2,
61+
confounder_1 == 0 & confounder_2 == 0 ~ 0.1,
62+
),
63+
exposure = rbinom(n, 1, p_exposure),
64+
outcome = confounder_1 + confounder_2 + rnorm(n)
65+
)
66+
```
67+
68+
1. Group the dataset by the confounders and exposure
69+
2. Calculate the mean of the outcome for the groups
70+
71+
```{r}
72+
sim2 |>
73+
group_by(_____, _____, _____) |>
74+
summarise(avg_y = mean(_____)) |>
75+
pivot_wider(names_from = exposure,
76+
values_from = avg_y,
77+
names_prefix = "x_") |>
78+
summarise(estimate = x_1 - x_0, .groups = "drop") |>
79+
summarise(estimate = mean(estimate))
80+
81+
```
82+
83+
## Your Turn 3
84+
85+
Run the following code to generate `sim3`
86+
87+
```{r}
88+
n <- 10000
89+
sim3 <- tibble(
90+
confounder = rnorm(n),
91+
p_exposure = exp(confounder) / (1 + exp(confounder)),
92+
exposure = rbinom(n, 1, p_exposure),
93+
outcome = confounder + rnorm(n)
94+
)
95+
```
96+
97+
1. Use `ntile()` from dplyr to calculate a binned version of `confounder` called `confounder_q`. We'll create a variable with 5 bins.
98+
2. Group the dataset by the binned variable you just created and exposure
99+
3. Calculate the mean of the outcome for the groups
100+
101+
```{r}
102+
sim3 |>
103+
mutate(confounder_q = _____(_____, 5)) |>
104+
group_by(_____, _____) |>
105+
summarise(avg_y = mean(_____)) |>
106+
pivot_wider(
107+
names_from = exposure,
108+
values_from = avg_y,
109+
names_prefix = "x_"
110+
) |>
111+
summarise(estimate = x_1 - x_0)
112+
113+
```
114+
115+
# Take aways
116+
117+
* Sometimes correlation *is* causation!
118+
* In simple cases, grouping by confounding variables can get us the right answer without a statistical model
119+
* Propensity scores generalize the idea of summarizing exposure effects to any number of confounders. Although we'll use models for this process, the foundations are the same.

exercises/05-quartets-exercises.qmd

Lines changed: 1 addition & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,9 +8,7 @@ format: html
88
library(quartets)
99
```
1010

11-
If you don't already have it, install the `quartets` package: `install.packages("quartets")`
12-
13-
## Your turn
11+
## Your turn 1
1412

1513
For each of the following 4 datasets, look at the correlation between `exposure` and `covariate`:
1614

@@ -45,11 +43,3 @@ For each of the following 4 datasets, fit a linear linear model examining the re
4543
* `causal_mediator_time`
4644
* `causal_m_bias_time`
4745

48-
## Stretch goal
49-
50-
Use the "g-computation" method to examine the causal effect of a change in exposure from 0 to 1 for each of the datasets. How does this compare to the results above?
51-
52-
```{r}
53-
54-
```
55-
146 KB
Binary file not shown.

slides/pdf/05-quartets.pdf

734 KB
Binary file not shown.

slides/raw/03-causal-inference-with-group-by-and-summarise.html

Lines changed: 65 additions & 53 deletions
Large diffs are not rendered by default.

slides/raw/03-causal-inference-with-group-by-and-summarise.qmd

Lines changed: 38 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,14 @@ sim |>
124124
summarise(estimate = x_1 - x_0)
125125
```
126126

127-
## Simulation
127+
## *Your Turn 1* (`03-ci-with-group-by-and-summarise-exercises.qmd`)
128+
129+
### Group the dataset by `confounder` and `exposure`
130+
### Calculate the mean of the `outcome` for the groups
131+
132+
`r countdown::countdown(minutes = 3)`
133+
134+
## *Your Turn 1*
128135

129136
```{r}
130137
#| code-line-numbers: "|2"
@@ -134,7 +141,7 @@ sim |>
134141
summarise(avg_y = mean(outcome))
135142
```
136143

137-
## Simulation
144+
## *Your Turn 1*
138145

139146
```{r}
140147
#| code-line-numbers: "|2"
@@ -147,7 +154,8 @@ sim |>
147154
values_from = avg_y,
148155
names_prefix = "x_"
149156
) |>
150-
summarise(estimate = x_1 - x_0)
157+
summarise(estimate = x_1 - x_0) |>
158+
summarise(estimate = mean(estimate)) # note, we would need to weight this if the confounder groups were not equal sized
151159
```
152160

153161
. . .
@@ -196,7 +204,12 @@ sim2 |>
196204
lm(outcome ~ exposure, data = sim2)
197205
```
198206

199-
## Simulation
207+
## *Your Turn 2*
208+
209+
### Group the dataset by the confounders and exposure
210+
### Calculate the mean of the outcome for the groups
211+
212+
## *Your Turn 2*
200213

201214
```{r}
202215
#| code-line-numbers: "|2"
@@ -209,10 +222,11 @@ sim2 |>
209222
values_from = avg_y,
210223
names_prefix = "x_"
211224
) |>
212-
summarise(estimate = x_1 - x_0)
225+
summarise(estimate = x_1 - x_0, .groups = "drop") |>
226+
summarise(estimate = mean(estimate))
213227
```
214228

215-
---
229+
`r countdown::countdown(minutes = 2)`
216230

217231
## Simulation
218232

@@ -222,7 +236,7 @@ sim2 |>
222236
```{r}
223237
#| code-line-numbers: "|1"
224238
n <- 100000
225-
sim2 <- tibble(
239+
big_sim2 <- tibble(
226240
confounder_1 = rbinom(n, 1, 0.5),
227241
confounder_2 = rbinom(n, 1, 0.5),
228242
@@ -241,7 +255,7 @@ sim2 <- tibble(
241255
::: {.column width="50%"}
242256
```{r}
243257
#| echo: false
244-
sim2 |>
258+
big_sim2 |>
245259
select(confounder_1, confounder_2, exposure, outcome)
246260
```
247261
:::
@@ -251,21 +265,22 @@ sim2 |>
251265
## Simulation
252266

253267
```{r}
254-
lm(outcome ~ exposure, data = sim2)
268+
lm(outcome ~ exposure, data = big_sim2)
255269
```
256270

257271
## Simulation
258272

259273
```{r}
260274
#| code-line-numbers: "|2"
261275
#| output-location: fragment
262-
sim2 |>
276+
big_sim2 |>
263277
group_by(confounder_1, confounder_2, exposure) |>
264278
summarise(avg_y = mean(outcome)) |>
265279
pivot_wider(names_from = exposure,
266280
values_from = avg_y,
267281
names_prefix = "x_") |>
268-
summarise(estimate = x_1 - x_0)
282+
summarise(estimate = x_1 - x_0, .groups = "drop") |>
283+
summarise(estimate = mean(estimate))
269284
```
270285

271286

@@ -305,10 +320,18 @@ sim3 |>
305320
lm(outcome ~ exposure, data = sim3)
306321
```
307322

308-
## Simulation
323+
## *Your Turn 3*
324+
325+
### Use `ntile()` from dplyr to calculate a binned version of `confounder` called `confounder_q`. We'll create a variable with 5 bins.
326+
### Group the dataset by the binned variable you just created and exposure
327+
### Calculate the mean of the outcome for the groups
328+
329+
`r countdown::countdown(minutes = 3)`
330+
331+
## *Your Turn 3*
309332

310333
```{r}
311-
#| code-line-numbers: "|2"
334+
#| code-line-numbers: "|2|3-4"
312335
#| output-location: fragment
313336
sim3 |>
314337
mutate(confounder_q = ntile(confounder, 5)) |>
@@ -319,7 +342,8 @@ sim3 |>
319342
values_from = avg_y,
320343
names_prefix = "x_"
321344
) |>
322-
summarise(estimate = x_1 - x_0)
345+
summarise(estimate = x_1 - x_0) |>
346+
summarise(estimate = mean(estimate))
323347
```
324348

325349
## {background-color="#23373B" .center .huge}

0 commit comments

Comments
 (0)