You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Bonus: Selection bias and correcting for loss to follow-up"
3
3
format: html
4
4
---
5
5
6
6
```{r}
7
7
#| label: setup
8
8
library(tidyverse)
9
9
library(broom)
10
-
library(touringplans)
11
10
library(propensity)
12
-
13
-
seven_dwarfs <- seven_dwarfs_train_2018 |>
14
-
filter(hour == 9)
15
-
```
16
-
17
-
# Your Turn 1
18
-
19
-
For the parametric G-formula, we'll use a single model to fit a causal model of Extra Magic Hours (`extra_magic_morning`) on Posted Waiting Times (`avg_spostmin`) where we include all covariates, much as we normally fit regression models. However, instead of interpreting the coefficients, we'll calculate the estimate by predicting on cloned data sets.
20
-
21
-
First, let's fit the model.
22
-
23
-
1.Use `lm()` to create a model with the outcome, exposure, and confounders.
24
-
2. Save the model as `standardized_model`
25
-
26
-
```{r}
27
-
_______ ___ _______(
28
-
avg_spostmin ~ _______ + wdw_ticket_season + close + weather_wdwhigh,
29
-
data = seven_dwarfs
30
-
)
31
-
```
32
-
33
-
34
-
# Your Turn 2
35
-
36
-
Now that we've fit a model, we need to clone our data set. To do this, we'll simply mutate it so that in one set, all participants have `extra_magic_morning` set to 0 and in another, all participants have `extra_magic_morning` set to 1.
37
-
38
-
1. Create the cloned data sets, called `yes` and `no`.
39
-
2. For both data sets, use `standardized_model` and `augment()` to get the predicted values. Use the `newdata` argument in `augment()` with the relevant cloned data set. Then, select only the fitted value. Rename `.fitted` to either `yes_extra_hours` or `no_extra_hours` (use the pattern `select(new_name = old_name)`).
40
-
3. Save the predicted data sets as`predicted_yes` and `predicted_no`.
41
-
42
-
```{r}
43
-
_______ <- seven_dwarfs |>
44
-
_______
45
-
46
-
_______ <- seven_dwarfs |>
47
-
_______
48
-
49
-
predicted_yes <- standardized_model |>
50
-
_______(newdata = _______) |>
51
-
_______
52
-
53
-
predicted_no <- standardized_model |>
54
-
_______(newdata = _______) |>
55
-
_______
56
11
```
57
12
58
-
# Your Turn 3
59
-
60
-
Finally, we'll get the mean differences between the values.
61
-
62
-
1. Bind `predicted_yes` and `predicted_no` using `bind_cols()`
63
-
2. Summarize the predicted values to create three new variables: `mean_yes`, `mean_no`, and `difference`. The first two should be the means of `yes_extra_hours` and `no_extra_hours`. `difference` should be `mean_yes` minus `mean_no`.
64
-
65
-
```{r}
66
-
_______ |>
67
-
_______(
68
-
mean_yes = _______,
69
-
mean_no = _______,
70
-
difference = _______
71
-
)
72
-
```
73
-
74
-
That's it! `difference` is our effect estimate. To get confidence intervals, however, we would need to use the bootstrap method. See the link below for a full example.
75
-
76
-
## Stretch goal: Boostrapped intervals
77
-
78
-
Like propensity-based models, we need to do a little more work to get correct standard errors and confidence intervals. In this stretch goal, use rsample to bootstrap the estimates we got from the G-computation model.
79
-
80
-
Remember, you need to bootstrap the entire modeling process, including the regression model, cloning the data sets, and calculating the effects.
81
-
82
-
```{r}
83
-
set.seed(1234)
84
-
library(rsample)
85
-
86
-
fit_gcomp <- function(split, ...) {
87
-
.df <- analysis(split)
88
-
89
-
# fit outcome model. remember to model using `.df` instead of `seven_dwarfs`
90
-
91
-
92
-
# clone datasets. remember to clone `.df` instead of `seven_dwarfs`
In this example, we'll consider loss to follow-up in the NHEFS study. We'll use the binary exposure we used earlier in the workshop: does quitting smoking (`smk`) increase weight (`wt82_71`)? This time, however, we'll adjust for loss to followup (people who dropped out of the study between observation periods) using inverse probability of censoring weights.
1. Take a look at how many participants were lost to follow up in `nhefs`, called `censored` in this data set. You don't need to change anything in this code.
121
18
@@ -145,7 +42,7 @@ cens_model <- ___(
145
42
)
146
43
```
147
44
148
-
# Your Turn 5
45
+
# Your Turn 2
149
46
150
47
1. Use the logistic model you just fit to create inverse probability of censoring weights
151
48
2. Calculate the weights using `.fitted`
@@ -155,7 +52,7 @@ cens_model <- ___(
155
52
```{r}
156
53
cens <- _______ |>
157
54
augment(type.predict = "response", data = nhefs_censored) |>
1.Next, we usually need to clone our datasets, but we can use `kept_smoking` and `quit_smoking` that we created in the first section
177
-
2. Use the outcome model, `cens_model`, to make predictions for `kept_smoking` and `quit_smoking`
73
+
1.Create the cloned data sets, called `kept_smoking` and `no`, where one dataset has `quit_smoking` set to 1 (quit smoking) and the other has it set to 0 (kept smoking).
74
+
2. Use the outcome model, `cens_model`, to make predictions for `kept_smoking` and `quit_smoking`
178
75
3. Calculate the differences between the mean values of `kept_smoking` and `quit_smoking`
179
76
180
77
```{r}
78
+
kept_smoking <- ____
79
+
quit_smoking <- ____
80
+
181
81
predicted_kept_smoking <- _______ |>
182
82
augment(newdata = _______) |>
183
83
select(kept_smoking = .fitted)
@@ -239,6 +139,5 @@ boot_estimate_cens
239
139
240
140
# Take aways
241
141
242
-
* To fit the parametric G-formula, fit a standardized model with all covariates. Then, use cloned data sets with values set to each level of the exposure you want to study.
243
-
* Use the model to predict the values for that level of the exposure and compute the effect estimate you want
244
142
* If loss to follow-up is potentially related to your study question, inverse probability of censoring weights can help mitigate the bias.
143
+
* You can use them in many types of models. If you're also using propensity score weights, simply multiply the weights together, then include the result as the weights for your outcome model.
0 commit comments