Skip to content

Commit ec58878

Browse files
merge continuous example with g-comp
1 parent bf6dd01 commit ec58878

File tree

1 file changed

+62
-86
lines changed

1 file changed

+62
-86
lines changed

exercises/10-continuous-g-computation-exercises.qmd

Lines changed: 62 additions & 86 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,10 @@ format: html
88
library(tidyverse)
99
library(broom)
1010
library(touringplans)
11-
library(propensity)
11+
library(splines)
1212
```
1313

14-
For this set of exercises, we'll use propensity scores for continuous exposures.
14+
For this set of exercises, we'll use g-computation to calculate a causal effect for continuous exposures.
1515

1616
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`avg_spostmin`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`avg_sactmin`) at 9 am? Here’s our DAG:
1717

@@ -73,124 +73,102 @@ wait_times <- eight |>
7373

7474
# Your Turn 1
7575

76-
First, let’s calculate the propensity score model, which will be the denominator in our stabilized weights (more to come on that soon). We’ll fit a model using `lm()` for `avg_spostmin` with our covariates, then use the fitted predictions of `avg_spostmin` (`.fitted`, `.sigma`) to calculate the density using `dnorm()`.
76+
For the parametric G-formula, we'll use a single model to fit a causal model of Posted Waiting Times (`avg_spostmin`) on Actual Waiting Times (`avg_sactmin`) where we include all covariates, much as we normally fit regression models. However, instead of interpreting the coefficients, we'll calculate the estimate by predicting on cloned data sets.
7777

78-
1. Fit a model using `lm()` with `avg_spostmin` as the outcome and the confounders identified in the DAG.
79-
2. Use `augment()` to add model predictions to the data frame
80-
3. In `dnorm()`, use `.fitted` as the mean and the mean of `.sigma` as the SD to calculate the propensity score for the denominator.
78+
Two additional differences in our model: we'll use a natural cubic spline on the exposure, `avg_spostmin`, using `ns()` from the splines package, and we'll include an interaction term between `avg_spostmin` and `extra_magic_mornin g`. These complicate the interpretation of the coefficient of the model in normal regression but have virtually no downside (as long as we have a reasonable sample size) in g-computation, because we still get an easily interpretable result.
79+
80+
First, let's fit the model.
81+
82+
1.Use `lm()` to create a model with the outcome, exposure, and confounders identified in the DAG.
83+
2. Save the model as `standardized_model`
8184

8285
```{r}
83-
denominator_model <- lm(
84-
__________,
85-
data = wait_times
86+
_______ ___ _______(
87+
avg_sactmin ~ ns(_______, df = 5)*extra_magic_morning + _______ + _______ + _______,
88+
data = seven_dwarfs
8689
)
87-
88-
denominators <- denominator_model |>
89-
______(data = wait_times) |>
90-
mutate(
91-
denominator = dnorm(
92-
avg_spostmin,
93-
mean = ______,
94-
sd = mean(______, na.rm = TRUE)
95-
)
96-
) |>
97-
select(date, denominator)
9890
```
9991

10092
# Your Turn 2
10193

102-
As with the example in the slides, we have a lot of extreme values for our weights
103-
104-
```{r}
105-
denominators |>
106-
mutate(wts = 1 / denominator) |>
107-
ggplot(aes(wts)) +
108-
geom_density(col = "#E69F00", fill = "#E69F0095") +
109-
scale_x_log10() +
110-
theme_minimal(base_size = 20) +
111-
xlab("Weights")
112-
```
94+
Now that we've fit a model, we need to clone our data set. To do this, we'll simply mutate it so that in one set, all participants have `avg_spostmin` set to 30 minutes and in another, all participants have `avg_spostmin` set to 60 minutes.
11395

114-
Let’s now fit the marginal density to use for stabilized weights:
115-
116-
1. Fit an intercept-only model of posted weight times to use as the numerator model
117-
2. Calculate the numerator weights using `dnorm()` as above.
118-
3. Finally, calculate the stabilized weights, `swts`, using the `numerator` and `denominator` weights
96+
1. Create the cloned data sets, called `thirty` and `sixty`.
97+
2. For both data sets, use `standardized_model` and `augment()` to get the predicted values. Use the `newdata` argument in `augment()` with the relevant cloned data set. Then, select only the fitted value. Rename `.fitted` to either `thirty_posted_minutes` or `sixty_posted_minutes` (use the pattern `select(new_name = old_name)`).
98+
3. Save the predicted data sets as`predicted_thirty` and `predicted_sixty`.
11999

120100
```{r}
121-
numerator_model <- lm(
122-
___ ~ ___,
123-
data = wait_times
124-
)
125-
126-
numerators <- numerator_model |>
127-
augment(data = wait_times) |>
128-
mutate(
129-
numerator = dnorm(
130-
avg_spostmin,
131-
___,
132-
mean(___, na.rm = TRUE)
133-
)
134-
) |>
135-
select(date, numerator)
136-
137-
wait_times_wts <- wait_times |>
138-
left_join(numerators, by = "date") |>
139-
left_join(denominators, by = "date") |>
140-
mutate(swts = ___)
141-
```
142-
143-
Take a look at the weights now that we've stabilized them:
101+
_______ <- seven_dwarfs |>
102+
_______
144103
145-
```{r}
146-
ggplot(wait_times_wts, aes(swts)) +
147-
geom_density(col = "#E69F00", fill = "#E69F0095") +
148-
scale_x_log10() +
149-
theme_minimal(base_size = 20) +
150-
xlab("Stabilized Weights")
151-
```
104+
_______ <- seven_dwarfs |>
105+
_______
152106
153-
Note that their mean is now about 1! That means the psuedo-population created by the weights is the same size as the observed population (the number of days we have wait time data, in this case).
107+
predicted_thirty <- standardized_model |>
108+
_______(newdata = _______) |>
109+
_______
154110
155-
```{r}
156-
round(mean(wait_times_wts$swts), digits = 2)
111+
predicted_sixty <- standardized_model |>
112+
_______(newdata = _______) |>
113+
_______
157114
```
158115

159-
160116
# Your Turn 3
161117

162-
Now, let's fit the outcome model!
118+
Finally, we'll get the mean differences between the values.
163119

164-
1. Estimate the relationship between posted wait times and actual wait times using the stabilized weights we just created.
120+
1. Bind `predicted_thirty` and `predicted_sixty` using `bind_cols()`
121+
2. Summarize the predicted values to create three new variables: `mean_thirty`, `mean_sixty`, and `difference`. The first two should be the means of `thirty_posted_minutes` and `sixty_posted_minutes`. `difference` should be `mean_sixty` minus `mean_thirty`.
165122

166123
```{r}
167-
lm(___ ~ ___, weights = ___, data = wait_times_wts) |>
168-
tidy() |>
169-
filter(term == "avg_spostmin") |>
170-
mutate(estimate = estimate * 10)
124+
_______ |>
125+
_______(
126+
mean_thirty = _______,
127+
mean_sixty = _______,
128+
difference = _______
129+
)
171130
```
172131

132+
That's it! `difference` is our effect estimate, marginalized over the spline terms, interaction effects, and confounders.
133+
173134
## Stretch goal: Boostrapped intervals
174135

175-
Bootstrap confidence intervals for our estimate.
136+
Like propensity-based models, we need to do a little more work to get correct standard errors and confidence intervals. In this stretch goal, use rsample to bootstrap the estimates we got from the G-computation model.
176137

177-
There's nothing new here. Just remember, you need to bootstrap the entire modeling process!
138+
Remember, you need to bootstrap the entire modeling process, including the regression model, cloning the data sets, and calculating the effects.
178139

179140
```{r}
180141
set.seed(1234)
181142
library(rsample)
182143
183-
fit_model <- function(split, ...) {
144+
fit_gcomp <- function(split, ...) {
184145
.df <- analysis(split)
185146
186-
# fill in the rest!
147+
# fit outcome model. remember to model using `.df` instead of `seven_dwarfs`
148+
149+
150+
# clone datasets. remember to clone `.df` instead of `seven_dwarfs`
151+
152+
153+
# predict actual wait time for each cloned dataset
154+
155+
156+
# calculate ATE
157+
bind_cols(predicted_yes, predicted_no) |>
158+
summarize(
159+
mean_thirty = mean(thirty_posted_minutes),
160+
mean_sixty = mean(sixty_posted_minutes),
161+
difference = mean_sixty - mean_thirty
162+
) |>
163+
# rsample expects a `term` and `estimate` column
164+
pivot_longer(everything(), names_to = "term", values_to = "estimate")
187165
}
188166
189-
model_estimate <- bootstraps(wait_times, 1000, apparent = TRUE) |>
167+
gcomp_results <- bootstraps(seven_dwarfs, 1000, apparent = TRUE) |>
190168
mutate(results = map(splits, ______))
191169
192170
# using bias-corrected confidence intervals
193-
boot_estimate <- int_bca(_______, results, .fn = fit_model)
171+
boot_estimate <- int_bca(_______, results, .fn = fit_gcomp)
194172
195173
boot_estimate
196174
```
@@ -199,7 +177,5 @@ boot_estimate
199177

200178
# Take aways
201179

202-
* We can calculate propensity scores for continuous exposures. Here, we use `dnorm(true_value, predicted_value, mean(estimated_sigma, rm.na = TRUE))` to use the normal density to transform predictions to a propensity-like scale. We can also use other approaches like quantile binning of the exposure, calculating probability-based propensity scores using categorical regression models.
203-
* Continuous exposures are prone to mispecification and usually need to be stabilized. A simple stabilization is to invert the propensity score by stabilization weights using an intercept-only model such as `lm(exposure ~ 1)`
204-
* Stabilization is useful for any type of exposure where the weights are unbounded. Weights like the ATO, making them less susceptible to extreme weights.
205-
* Using propensity scores for continuous exposures in outcome models is identical to using them with binary exposures.
180+
* To fit the parametric G-formula, fit a standardized model with all covariates. Then, use cloned data sets with values set to each level of the exposure you want to study.
181+
* Use the model to predict the values for that level of the exposure and compute the effect estimate you want

0 commit comments

Comments
 (0)