Skip to content

Commit 8670b76

Browse files
Merge pull request #74 from r-causal/clean_up
Spring cleaning
2 parents e6bbc97 + ae71362 commit 8670b76

File tree

143 files changed

+2776
-2272
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

143 files changed

+2776
-2272
lines changed

exercises/05-quartets-exercises.qmd

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,19 +11,21 @@ library(quartets)
1111

1212
## Your turn 1
1313

14-
For each of the following 4 datasets, look at the correlation between `exposure` and `covariate`:
14+
For each of the following 4 datasets, create a scatterplot looking at the relationship between `exposure` and `outcome`:
1515

1616
* `causal_collider`
1717
* `causal_confounding`
1818
* `causal_mediator`
1919
* `causal_m_bias`
2020

21+
(Alternatively, you can work with `causal_quartet`, which has all four datasets stacked on top of one another.)
2122

2223
```{r}
2324
2425
```
2526

26-
For each of the above 4 datasets, create a scatterplot looking at the relationship between `exposure` and `outcome`
27+
For each of the above 4 datasets, look at the correlation between `exposure` and `covariate`
28+
2729

2830
```{r}
2931
@@ -37,13 +39,15 @@ For each of the above 4 datasets, fit a linear model to examine the relationship
3739

3840
## Your turn 2
3941

40-
For each of the following 4 datasets, fit a linear linear model examining the relationship between `outcome_followup` and `exposure_baseline` adjusting for `covariate_baseline`:
42+
For each of the following 4 datasets, fit a linear model examining the relationship between `outcome_followup` and `exposure_baseline` adjusting for `covariate_baseline`:
4143

4244
* `causal_collider_time`
4345
* `causal_confounding_time`
4446
* `causal_mediator_time`
4547
* `causal_m_bias_time`
4648

49+
(Alternatively, you can work with `causal_quartet_time`, which has all four datasets stacked on top of one another.)
50+
4751
```{r}
4852
4953
```

exercises/06-intro-pscores-exercises.qmd

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,32 +28,32 @@ Below is a proposed DAG for this question.
2828
set.seed(1234)
2929
3030
coord_dag <- list(
31-
x = c(Season = 0, close = 0, weather = -1, x = 1, y = 2),
32-
y = c(Season = -1, close = 1, weather = 0, x = 0, y = 0)
31+
x = c(season = 0, close = 0, weather = -1, emm = 1, wait_posted = 2),
32+
y = c(season = -1, close = 1, weather = 0, emm = 0, wait_posted = 0)
3333
)
3434
3535
labels <- c(
36-
x = "Extra Magic Morning",
37-
y = "Average wait",
38-
Season = "Ticket Season",
36+
emm = "Extra Magic Morning",
37+
wait_posted = "Average wait",
38+
season = "Ticket Season",
3939
weather = "Historic high temperature",
4040
close = "Time park closed"
4141
)
4242
4343
dagify(
44-
y ~ x + close + Season + weather,
45-
x ~ weather + close + Season,
44+
wait_posted ~ emm + close + season + weather,
45+
emm ~ weather + close + season,
4646
coords = coord_dag,
4747
labels = labels,
48-
exposure = "x",
49-
outcome = "y"
48+
exposure = "emm",
49+
outcome = "wait_posted"
5050
) |>
5151
tidy_dagitty() |>
5252
node_status() |>
5353
ggplot(
5454
aes(x, y, xend = xend, yend = yend, color = status)
5555
) +
56-
geom_dag_edges_arc(curvature = c(rep(0, 5), .3, 0)) +
56+
geom_dag_edges_arc(curvature = c(rep(0, 6), .3)) +
5757
geom_dag_point() +
5858
geom_dag_label_repel(
5959
aes(x, y, label = label),
@@ -108,12 +108,9 @@ Here's a data dictionary of the variables we need in the `seven_dwarfs` data set
108108

109109
## Your Turn
110110

111-
*After updating the code chunks below, change `eval: true` before rendering*
112-
113111
Now, fit a propensity score model for `park_extra_magic_morning` using the above proposed confounders.
114112

115113
```{r}
116-
#| eval: false
117114
propensity_model <- ___(
118115
___ ~ ___,
119116
data = seven_dwarfs,
@@ -124,7 +121,6 @@ propensity_model <- ___(
124121
Add the propensity scores to the `seven_dwarfs` data set, call this new dataset `df`.
125122

126123
```{r}
127-
#| eval: false
128124
df <- propensity_model |>
129125
____(type.predict = ____, data = ____)
130126
```

exercises/07-pscores-using-exercises.qmd

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ library(tidyverse)
1010
library(broom)
1111
library(touringplans)
1212
library(propensity)
13+
library(halfmoon)
1314
```
1415

1516
We are interested in examining the relationship between whether there were "Extra Magic Hours" in the morning (the **exposure**) and the average wait time for the Seven Dwarfs Mine Train the same day between 9am and 10am (the **outcome**).
@@ -32,12 +33,13 @@ seven_dwarfs_prop <- propensity_model |>
3233

3334
## Your Turn 1 (Matching)
3435

35-
*After updating the code chunks below, change `eval: true` before rendering*
36+
Create at "matched" data set using the same propensity score model as above and a caliper of 0.2.
3637

37-
Create at "matched" data set using the same propensity score model as above and a caliper of 0.2.
38+
1. Provide `matchit()` the formula for the propensity score.
39+
2. Using the `link` and `caliper` arguments, create a caliper of 0.2 SDs on the linear logit scale.
40+
3. Extract the matched datasets into a new data frame called `matched_df`.
3841

3942
```{r}
40-
#| eval: false
4143
library(MatchIt)
4244
matched_dwarfs <- matchit(
4345
___,
@@ -51,12 +53,9 @@ matched_df <- ___(matched_dwarfs)
5153

5254
## Your Turn 2 (Weighting)
5355

54-
*After updating the code chunks below, change `eval: true` before rendering*
55-
5656
Add the ATE weights to the data frame, `seven_dwarfs_prop`
5757

5858
```{r}
59-
#| eval: false
6059
seven_dwarfs_prop <- seven_dwarfs_prop |>
6160
mutate(w_ate = ___)
6261
```
@@ -67,7 +66,6 @@ Stretch Goal 1:
6766
Add ATM weights to the data frame, `seven_dwarfs_prop`
6867

6968
```{r}
70-
#| eval: false
7169
seven_dwarfs_prop <- seven_dwarfs_prop |>
7270
mutate(w_atm = ___)
7371
```
@@ -78,18 +76,24 @@ Update the code below to examine the distribution of the weighted sample. **HINT
7876

7977

8078
```{r}
81-
#| eval: false
8279
#| warning: false
83-
ggplot(
84-
seven_dwarfs_prop,
85-
aes(.fitted, fill = factor(park_extra_magic_morning))
86-
) +
87-
geom_mirror_histogram(bins = 50, alpha = .5) +
88-
geom_mirror_histogram(aes(weight = ____), alpha = .5, bins = 50) +
80+
seven_dwarfs_prop |>
81+
mutate(
82+
park_extra_magic_morning = factor(park_extra_magic_morning)
83+
) |>
84+
ggplot(aes(.fitted)) +
85+
geom_mirror_histogram(
86+
aes(group = park_extra_magic_morning),
87+
bins = 30
88+
) +
89+
geom_mirror_histogram(
90+
aes(fill = park_extra_magic_morning, weight = ______),
91+
alpha = .5,
92+
bins = 30
93+
) +
8994
geom_hline(yintercept = 0, lwd = 0.5) +
9095
theme_minimal() +
9196
scale_y_continuous(labels = abs) +
9297
scale_fill_manual(values = c("blue", "green")) +
93-
labs(x = "p", fill = "Extra Magic Morning") +
94-
xlim(0, 1)
98+
labs(x = "p", fill = "Extra Magic Morning")
9599
```

exercises/08-pscores-diagnostics-exercises.qmd

Lines changed: 30 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,12 +34,9 @@ seven_dwarfs_ps <- propensity_model |>
3434

3535
## Your Turn 1
3636

37-
*After updating the code chunks below, change `eval: true` before rendering*
38-
3937
Calculate the standardized mean differences with and without weights
4038

4139
```{r}
42-
#| eval: false
4340
smds <- seven_dwarfs_ps |>
4441
mutate(park_close = as.numeric(park_close)) |>
4542
tidy_smd(
@@ -52,20 +49,20 @@ smds <- seven_dwarfs_ps |>
5249
Create the Love Plot using ggplot and halfmoon
5350

5451
```{r}
55-
#| eval: false
5652
ggplot(
5753
data = ____,
5854
aes(x = abs(____), y = ____, group = ____, color = ____)
5955
) +
6056
geom_love()
6157
```
6258

59+
Stretch goal: Create a Love Plot using `make_dummy_vars = TRUE` and sorted by `abs(smd)` for the SMD values for the rows represnting the observed data.
60+
6361
## Your Turn 2
6462

6563
Create an unweighted ECDF for `park_temperature_high` by whether or not the day had Extra Magic Hours.
6664

6765
```{r}
68-
#| eval: false
6966
ggplot(seven_dwarfs_ps, aes(x = ____, group = ____, color = factor(____))) +
7067
____() +
7168
scale_color_manual(
@@ -80,15 +77,41 @@ ggplot(seven_dwarfs_ps, aes(x = ____, group = ____, color = factor(____))) +
8077
Create an weighted ECDF for `park_temperature_high` by whether or not the day had Extra Magic Hours.
8178

8279
```{r}
83-
#| eval: false
8480
ggplot(seven_dwarfs_ps, aes(x = ____, color = factor(____))) +
8581
____(aes(weights = ____)) +
8682
scale_color_manual(
8783
"Extra Magic Hours",
8884
values = c("#5154B8", "#5DB854"),
8985
labels = c("Yes", "No")
9086
) +
91-
xlab(____) +
87+
xlab("Historic Temperature") +
9288
ylab("Proportion <= x (Weighted)")
9389
```
9490

91+
## Bonus Your Turn: Weighted Tables
92+
93+
Create a weighted table for the seven dwarfs dataset given your weights
94+
95+
1. Create a survey design object using `svydesign()` that specifies the weights as `w_ate`
96+
2. Use `tbl_svysummary()` by `park_extra_magic_morning` to specify a weighted table
97+
3. Use `add_difference()` to add `"smd"` differences for every variable
98+
99+
100+
```{r}
101+
library(survey)
102+
library(gtsummary)
103+
seven_dwarfs_ps |>
104+
select(park_extra_magic_morning, park_ticket_season, park_close, park_temperature_high, w_ate) |>
105+
______(
106+
ids = ~ 1,
107+
data = _,
108+
weights = ~ ____
109+
) |>
110+
______(
111+
by = ____,
112+
include = -w_ate
113+
) |>
114+
add_difference(everything() ~ "____")
115+
```
116+
117+

exercises/09-outcome-model-exercises.qmd

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,9 @@ We are interested in examining the relationship between whether there were "Extr
2020

2121
## Your turn
2222

23-
*After updating the code chunks below, change `eval: true` before rendering*
24-
2523
Create a function called `ipw_fit` that fits the propensity score model from Exercise 03, incorporates the ATE weights calculated in Exercise 04, and fits a weighted outcome model.
2624

2725
```{r}
28-
#| eval: false
2926
fit_ipw <- function(split, ...) {
3027
.df <- ____
3128
@@ -42,7 +39,6 @@ fit_ipw <- function(split, ...) {
4239
Bootstrap this result 1000 times.
4340

4441
```{r}
45-
#| eval: false
4642
set.seed(1234)
4743
4844
ipw_results <- ____(___, 1000, apparent = TRUE) |>
@@ -52,7 +48,6 @@ ipw_results <- ____(___, 1000, apparent = TRUE) |>
5248
Check out the distribution of estimates (**no need to change this code**)
5349

5450
```{r}
55-
#| eval: false
5651
ipw_results |>
5752
mutate(
5853
estimate = map_dbl(
@@ -71,7 +66,6 @@ ipw_results |>
7166
Calculate the confidence interval
7267

7368
```{r}
74-
#| eval: false
7569
boot_estimate <- ____(____, ____) |>
7670
filter(term == ____)
7771

exercises/10-continuous-g-computation-exercises.qmd

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,6 @@ library(tidyverse)
99
library(broom)
1010
library(touringplans)
1111
library(splines)
12-
13-
seven_dwarfs <- seven_dwarfs_train_2018 |>
14-
filter(wait_hour == 9)
1512
```
1613

1714
For this set of exercises, we'll use g-computation to calculate a causal effect for continuous exposures.
@@ -117,8 +114,8 @@ First, let's fit the model.
117114

118115
```{r}
119116
_______ ___ _______(
120-
wait_minutes_actual_avg ~ ns(_______, df = 5)*park_extra_magic_morning + _______ + _______ + _______,
121-
data = seven_dwarfs
117+
wait_minutes_actual_avg ~ ns(_______, df = 2)*park_extra_magic_morning + _______ + _______ + _______,
118+
data = wait_times
122119
)
123120
```
124121

@@ -131,10 +128,10 @@ Now that we've fit a model, we need to clone our data set. To do this, we'll sim
131128
3. Save the predicted data sets as`predicted_thirty` and `predicted_sixty`.
132129

133130
```{r}
134-
_______ <- seven_dwarfs |>
131+
_______ <- wait_times |>
135132
_______
136133
137-
_______ <- seven_dwarfs |>
134+
_______ <- wait_times |>
138135
_______
139136
140137
predicted_thirty <- standardized_model |>
@@ -177,10 +174,10 @@ library(rsample)
177174
fit_gcomp <- function(split, ...) {
178175
.df <- analysis(split)
179176
180-
# fit outcome model. remember to model using `.df` instead of `seven_dwarfs`
177+
# fit outcome model. remember to model using `.df` instead of `wait_times`
181178
182179
183-
# clone datasets. remember to clone `.df` instead of `seven_dwarfs`
180+
# clone datasets. remember to clone `.df` instead of `wait_times`
184181
185182
186183
# predict actual wait time for each cloned dataset
@@ -197,7 +194,7 @@ fit_gcomp <- function(split, ...) {
197194
pivot_longer(everything(), names_to = "term", values_to = "estimate")
198195
}
199196
200-
gcomp_results <- bootstraps(seven_dwarfs, 1000, apparent = TRUE) |>
197+
gcomp_results <- bootstraps(wait_times, 1000, apparent = TRUE) |>
201198
mutate(results = map(splits, ______))
202199
203200
# using bias-corrected confidence intervals

exercises/12-whole-game-2-exercises.qmd

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,8 @@ mosquito_dag |>
119119
theme_dag(base_size = 14) +
120120
theme(legend.position = "none") +
121121
labs(caption = "Thanks to Andrew Heiss for the data!") +
122-
coord_cartesian(clip = "off")
122+
coord_cartesian(clip = "off") +
123+
ggokabeito::scale_color_okabe_ito(na.value = "grey90")
123124
```
124125

125126
# Your Turn

slides/pdf/00-intro.pdf

3.14 MB
Binary file not shown.
51.5 KB
Binary file not shown.
-6.26 KB
Binary file not shown.

0 commit comments

Comments
 (0)