Skip to content

Commit e79cb1b

Browse files
placeholders for both sets of exercises
1 parent bd180d6 commit e79cb1b

File tree

2 files changed

+205
-0
lines changed

2 files changed

+205
-0
lines changed
File renamed without changes.
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
---
2+
title: "Continuous exposures"
3+
format: html
4+
---
5+
6+
```{r}
7+
#| label: setup
8+
library(tidyverse)
9+
library(broom)
10+
library(touringplans)
11+
library(propensity)
12+
```
13+
14+
For this set of exercises, we'll use propensity scores for continuous exposures.
15+
16+
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`avg_spostmin`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`avg_sactmin`) at 9 am? Here’s our DAG:
17+
18+
```{r}
19+
#| echo: false
20+
#| message: false
21+
#| warning: false
22+
library(ggdag)
23+
24+
coord_dag <- list(
25+
x = c(wdw_ticket_season = -1, close = -1, weather_wdwhigh = -2, extra_magic_morning = 0, avg_spostmin = 1, avg_sactmin = 2),
26+
y = c(wdw_ticket_season = -1, close = 1, weather_wdwhigh = 0.25, extra_magic_morning = 0, avg_spostmin = 0, avg_sactmin = 0)
27+
)
28+
29+
labels <- c(
30+
avg_sactmin = "Average actual wait",
31+
avg_spostmin = "Average posted wait ",
32+
extra_magic_morning = "Extra Magic Morning",
33+
wdw_ticket_season = "Ticket Season",
34+
weather_wdwhigh = "Historic high temperature",
35+
close = "Time park closed"
36+
)
37+
38+
wait_time_dag <- dagify(
39+
avg_sactmin ~ avg_spostmin + close + wdw_ticket_season + weather_wdwhigh + extra_magic_morning,
40+
avg_spostmin ~ weather_wdwhigh + close + wdw_ticket_season + extra_magic_morning,
41+
coords = coord_dag,
42+
labels = labels
43+
)
44+
45+
wait_time_dag |>
46+
ggdag(use_labels = "label", text = FALSE) +
47+
theme_void() +
48+
scale_x_continuous(
49+
limits = c(-2.25, 2.25),
50+
breaks = c(-2, -1, 0, 1, 2),
51+
labels = c("\n(one year ago)", "\n(6 months ago)", "\n(3 months ago)", "8am-9am\n(Today)", "9am-10am\n(Today)")
52+
) +
53+
theme(axis.text.x = element_text())
54+
```
55+
56+
First, let’s wrangle our data to address our question: do posted wait times at 8 affect actual weight times at 9? We’ll join the baseline data (all covariates and posted wait time at 8) with the outcome (average actual time). We also have a lot of missingness for `avg_sactmin`, so we’ll drop unobserved values for now.
57+
58+
You don't need to update any code here, so just run this.
59+
60+
```{r}
61+
eight <- seven_dwarfs_train_2018 |>
62+
filter(hour == 8) |>
63+
select(-avg_sactmin)
64+
65+
nine <- seven_dwarfs_train_2018 |>
66+
filter(hour == 9) |>
67+
select(date, avg_sactmin)
68+
69+
wait_times <- eight |>
70+
left_join(nine, by = "date") |>
71+
drop_na(avg_sactmin)
72+
```
73+
74+
# Your Turn 1
75+
76+
First, let’s calculate the propensity score model, which will be the denominator in our stabilized weights (more to come on that soon). We’ll fit a model using `lm()` for `avg_spostmin` with our covariates, then use the fitted predictions of `avg_spostmin` (`.fitted`, `.sigma`) to calculate the density using `dnorm()`.
77+
78+
1. Fit a model using `lm()` with `avg_spostmin` as the outcome and the confounders identified in the DAG.
79+
2. Use `augment()` to add model predictions to the data frame
80+
3. In `dnorm()`, use `.fitted` as the mean and the mean of `.sigma` as the SD to calculate the propensity score for the denominator.
81+
82+
```{r}
83+
denominator_model <- lm(
84+
__________,
85+
data = wait_times
86+
)
87+
88+
denominators <- denominator_model |>
89+
______(data = wait_times) |>
90+
mutate(
91+
denominator = dnorm(
92+
avg_spostmin,
93+
mean = ______,
94+
sd = mean(______, na.rm = TRUE)
95+
)
96+
) |>
97+
select(date, denominator)
98+
```
99+
100+
# Your Turn 2
101+
102+
As with the example in the slides, we have a lot of extreme values for our weights
103+
104+
```{r}
105+
denominators |>
106+
mutate(wts = 1 / denominator) |>
107+
ggplot(aes(wts)) +
108+
geom_density(col = "#E69F00", fill = "#E69F0095") +
109+
scale_x_log10() +
110+
theme_minimal(base_size = 20) +
111+
xlab("Weights")
112+
```
113+
114+
Let’s now fit the marginal density to use for stabilized weights:
115+
116+
1. Fit an intercept-only model of posted weight times to use as the numerator model
117+
2. Calculate the numerator weights using `dnorm()` as above.
118+
3. Finally, calculate the stabilized weights, `swts`, using the `numerator` and `denominator` weights
119+
120+
```{r}
121+
numerator_model <- lm(
122+
___ ~ ___,
123+
data = wait_times
124+
)
125+
126+
numerators <- numerator_model |>
127+
augment(data = wait_times) |>
128+
mutate(
129+
numerator = dnorm(
130+
avg_spostmin,
131+
___,
132+
mean(___, na.rm = TRUE)
133+
)
134+
) |>
135+
select(date, numerator)
136+
137+
wait_times_wts <- wait_times |>
138+
left_join(numerators, by = "date") |>
139+
left_join(denominators, by = "date") |>
140+
mutate(swts = ___)
141+
```
142+
143+
Take a look at the weights now that we've stabilized them:
144+
145+
```{r}
146+
ggplot(wait_times_wts, aes(swts)) +
147+
geom_density(col = "#E69F00", fill = "#E69F0095") +
148+
scale_x_log10() +
149+
theme_minimal(base_size = 20) +
150+
xlab("Stabilized Weights")
151+
```
152+
153+
Note that their mean is now about 1! That means the psuedo-population created by the weights is the same size as the observed population (the number of days we have wait time data, in this case).
154+
155+
```{r}
156+
round(mean(wait_times_wts$swts), digits = 2)
157+
```
158+
159+
160+
# Your Turn 3
161+
162+
Now, let's fit the outcome model!
163+
164+
1. Estimate the relationship between posted wait times and actual wait times using the stabilized weights we just created.
165+
166+
```{r}
167+
lm(___ ~ ___, weights = ___, data = wait_times_wts) |>
168+
tidy() |>
169+
filter(term == "avg_spostmin") |>
170+
mutate(estimate = estimate * 10)
171+
```
172+
173+
## Stretch goal: Boostrapped intervals
174+
175+
Bootstrap confidence intervals for our estimate.
176+
177+
There's nothing new here. Just remember, you need to bootstrap the entire modeling process!
178+
179+
```{r}
180+
set.seed(1234)
181+
library(rsample)
182+
183+
fit_model <- function(split, ...) {
184+
.df <- analysis(split)
185+
186+
# fill in the rest!
187+
}
188+
189+
model_estimate <- bootstraps(wait_times, 1000, apparent = TRUE) |>
190+
mutate(results = map(splits, ______))
191+
192+
# using bias-corrected confidence intervals
193+
boot_estimate <- int_bca(_______, results, .fn = fit_model)
194+
195+
boot_estimate
196+
```
197+
198+
***
199+
200+
# Take aways
201+
202+
* We can calculate propensity scores for continuous exposures. Here, we use `dnorm(true_value, predicted_value, mean(estimated_sigma, rm.na = TRUE))` to use the normal density to transform predictions to a propensity-like scale. We can also use other approaches like quantile binning of the exposure, calculating probability-based propensity scores using categorical regression models.
203+
* Continuous exposures are prone to mispecification and usually need to be stabilized. A simple stabilization is to invert the propensity score by stabilization weights using an intercept-only model such as `lm(exposure ~ 1)`
204+
* Stabilization is useful for any type of exposure where the weights are unbounded. Weights like the ATO, making them less susceptible to extreme weights.
205+
* Using propensity scores for continuous exposures in outcome models is identical to using them with binary exposures.

0 commit comments

Comments
 (0)