You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: exercises/14-bonus-continuous-pscores-exercises.qmd
+21-46Lines changed: 21 additions & 46 deletions
Original file line number
Diff line number
Diff line change
@@ -76,33 +76,28 @@ wait_times <- eight |>
76
76
First, let’s calculate the propensity score model, which will be the denominator in our stabilized weights (more to come on that soon). We’ll fit a model using `lm()` for `avg_spostmin` with our covariates, then use the fitted predictions of `avg_spostmin` (`.fitted`, `.sigma`) to calculate the density using `dnorm()`.
77
77
78
78
1. Fit a model using `lm()` with `avg_spostmin` as the outcome and the confounders identified in the DAG.
79
-
2. Use `augment()` to add model predictions to the data frame
80
-
3. In `dnorm()`, use `.fitted` as the mean and the mean of `.sigma` as the SD to calculate the propensity score for the denominator.
79
+
2. Use `augment()` to add model predictions to the data frame.
80
+
3. In `wt_ate()`, calculate the weights using `avg_postmin`, `.fitted`, and `.sigma`.
81
81
82
82
```{r}
83
-
denominator_model <- lm(
83
+
post_time_model <- lm(
84
84
__________,
85
85
data = wait_times
86
86
)
87
87
88
-
denominators <- denominator_model |>
88
+
wait_times_wts <- post_time_model |>
89
89
______(data = wait_times) |>
90
-
mutate(
91
-
denominator = dnorm(
92
-
avg_spostmin,
93
-
mean = ______,
94
-
sd = mean(______, na.rm = TRUE)
95
-
)
96
-
) |>
97
-
select(date, denominator)
90
+
mutate(wts = ______(
91
+
______, ______, .sigma = ______
92
+
))
98
93
```
99
94
100
95
# Your Turn 2
101
96
102
97
As with the example in the slides, we have a lot of extreme values for our weights
103
98
104
99
```{r}
105
-
denominators |>
100
+
wait_times_wts |>
106
101
mutate(wts = 1 / denominator) |>
107
102
ggplot(aes(wts)) +
108
103
geom_density(col = "#E69F00", fill = "#E69F0095") +
@@ -113,58 +108,37 @@ denominators |>
113
108
114
109
Let’s now fit the marginal density to use for stabilized weights:
115
110
116
-
1. Fit an intercept-only model of posted weight times to use as the numerator model
117
-
2. Calculate the numerator weights using `dnorm()` as above.
118
-
3. Finally, calculate the stabilized weights, `swts`, using the `numerator` and `denominator` weights
111
+
1. Re-fit the above using stabilized weights
119
112
120
113
```{r}
121
-
numerator_model <- lm(
122
-
___ ~ ___,
123
-
data = wait_times
124
-
)
125
-
126
-
numerators <- numerator_model |>
114
+
wait_times_swts <- post_time_model |>
127
115
augment(data = wait_times) |>
128
-
mutate(
129
-
numerator = dnorm(
130
-
avg_spostmin,
131
-
___,
132
-
mean(___, na.rm = TRUE)
133
-
)
134
-
) |>
135
-
select(date, numerator)
136
-
137
-
wait_times_wts <- wait_times |>
138
-
left_join(numerators, by = "date") |>
139
-
left_join(denominators, by = "date") |>
140
-
mutate(swts = ___)
116
+
mutate(swts = _____(
117
+
_____,
118
+
_____,
119
+
.sigma = .sigma,
120
+
_____ = _____
121
+
))
141
122
```
142
123
143
124
Take a look at the weights now that we've stabilized them:
144
125
145
126
```{r}
146
-
ggplot(wait_times_wts, aes(swts)) +
127
+
ggplot(wait_times_swts, aes(swts)) +
147
128
geom_density(col = "#E69F00", fill = "#E69F0095") +
148
129
scale_x_log10() +
149
130
theme_minimal(base_size = 20) +
150
131
xlab("Stabilized Weights")
151
132
```
152
133
153
-
Note that their mean is now about 1! That means the psuedo-population created by the weights is the same size as the observed population (the number of days we have wait time data, in this case).
154
-
155
-
```{r}
156
-
round(mean(wait_times_wts$swts), digits = 2)
157
-
```
158
-
159
-
160
134
# Your Turn 3
161
135
162
136
Now, let's fit the outcome model!
163
137
164
138
1. Estimate the relationship between posted wait times and actual wait times using the stabilized weights we just created.
165
139
166
140
```{r}
167
-
lm(___ ~ ___, weights = ___, data = wait_times_wts) |>
141
+
lm(___ ~ ___, weights = ___, data = wait_times_swts) |>
168
142
tidy() |>
169
143
filter(term == "avg_spostmin") |>
170
144
mutate(estimate = estimate * 10)
@@ -199,7 +173,8 @@ boot_estimate
199
173
200
174
# Take aways
201
175
202
-
* We can calculate propensity scores for continuous exposures. Here, we use `dnorm(true_value, predicted_value, mean(estimated_sigma, rm.na = TRUE))` to use the normal density to transform predictions to a propensity-like scale. We can also use other approaches like quantile binning of the exposure, calculating probability-based propensity scores using categorical regression models.
203
-
* Continuous exposures are prone to mispecification and usually need to be stabilized. A simple stabilization is to invert the propensity score by stabilization weights using an intercept-only model such as `lm(exposure ~ 1)`
176
+
* We can calculate propensity scores for continuous exposures. `wt_ate()` uses `dnorm()` to use the normal density to transform predictions to a propensity-like scale; we need to give `wt_ate()``.sigma` as to calculate do this. We can also use other approaches like quantile binning of the exposure, calculating probability-based propensity scores using categorical regression models.
177
+
* Continuous exposures are prone to mispecification and usually need to be stabilized. A simple stabilization is to invert the propensity score by stabilization weights using an intercept-only model such as `lm(exposure ~ 1)`. `wt_ate()` can do this for you automatically with `stabilize = TRUE`. This also applies to other types of exposures.
204
178
* Stabilization is useful for any type of exposure where the weights are unbounded. Weights like the ATO, making them less susceptible to extreme weights.
205
179
* Using propensity scores for continuous exposures in outcome models is identical to using them with binary exposures.
180
+
* Because propensity scores for continuous exposures are prone to positivity violation, check the bootstrap distribution of your estimate for skew and to see if the mean estimate is different from your regression model. If these problems are present, you may need to use another approach like g-computation.
0 commit comments