You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Extra Magic Morning (exposure) |`extra_magic_morning`|
105
-
| Ticket Season |`wdw_ticket_season`|
106
-
| Closing Time |`close`|
107
-
| Historic Temperature |`weather_wdwhigh`|
103
+
| Posted Wait Time (outcome) |`wait_minutes_posted_avg`|
104
+
| Extra Magic Morning (exposure) |`park_extra_magic_morning`|
105
+
| Ticket Season |`park_ticket_season`|
106
+
| Closing Time |`park_close`|
107
+
| Historic Temperature |`park_temperature_high`|
108
108
109
109
## Your Turn
110
110
111
111
*After updating the code chunks below, change `eval: true` before rendering*
112
112
113
-
Now, fit a propensity score model for `extra_magic_morning` using the above proposed confounders.
113
+
Now, fit a propensity score model for `park_extra_magic_morning` using the above proposed confounders.
114
114
115
115
```{r}
116
116
#| eval: false
@@ -131,7 +131,7 @@ df <- propensity_model |>
131
131
132
132
Stretch Goal 1:
133
133
134
-
Examine two histograms of the propensity scores, one days with Extra Magic Morning (`extra_magic_morning == 1`) and one for days without it (`extra_magic_morning == 0`).
134
+
Examine two histograms of the propensity scores, one days with Extra Magic Morning (`park_extra_magic_morning == 1`) and one for days without it (`park_extra_magic_morning == 0`).
Copy file name to clipboardExpand all lines: exercises/09-outcome-model-exercises.qmd
+3-3Lines changed: 3 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ library(rsample)
13
13
library(propensity)
14
14
15
15
seven_dwarfs <- seven_dwarfs_train_2018 |>
16
-
filter(hour == 9)
16
+
filter(wait_hour == 9)
17
17
```
18
18
19
19
We are interested in examining the relationship between whether there were "Extra Magic Hours" in the morning (the **exposure**) and the average wait time for the Seven Dwarfs Mine Train the same day between 9am and 10am (the **outcome**).
@@ -57,9 +57,9 @@ ipw_results |>
57
57
mutate(
58
58
estimate = map_dbl(
59
59
boot_fits,
60
-
# pull the `estimate` for `extra_magic_morning` for each fit
60
+
# pull the `estimate` for `park_extra_magic_morning` for each fit
Copy file name to clipboardExpand all lines: exercises/10-continuous-g-computation-exercises.qmd
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ library(splines)
13
13
14
14
For this set of exercises, we'll use g-computation to calculate a causal effect for continuous exposures.
15
15
16
-
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`avg_spostmin`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`avg_sactmin`) at 9 am? Here’s our DAG:
16
+
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`wait_minutes_posted_avg`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`wait_minutes_actual_avg`) at 9 am? Here’s our DAG:
17
17
18
18
```{r}
19
19
#| echo: false
@@ -83,29 +83,29 @@ dagify(
83
83
)
84
84
```
85
85
86
-
First, let’s wrangle our data to address our question: do posted wait times at 8 affect actual weight times at 9? We’ll join the baseline data (all covariates and posted wait time at 8) with the outcome (average actual time). We also have a lot of missingness for `avg_sactmin`, so we’ll drop unobserved values for now.
86
+
First, let’s wrangle our data to address our question: do posted wait times at 8 affect actual weight times at 9? We’ll join the baseline data (all covariates and posted wait time at 8) with the outcome (average actual time). We also have a lot of missingness for `wait_minutes_actual_avg`, so we’ll drop unobserved values for now.
87
87
88
88
You don't need to update any code here, so just run this.
89
89
90
90
```{r}
91
91
eight <- seven_dwarfs_train_2018 |>
92
-
filter(hour == 8) |>
93
-
select(-avg_sactmin)
92
+
filter(wait_hour == 8) |>
93
+
select(-wait_minutes_actual_avg)
94
94
95
95
nine <- seven_dwarfs_train_2018 |>
96
-
filter(hour == 9) |>
97
-
select(date, avg_sactmin)
96
+
filter(wait_hour == 9) |>
97
+
select(park_date, wait_minutes_actual_avg)
98
98
99
99
wait_times <- eight |>
100
-
left_join(nine, by = "date") |>
101
-
drop_na(avg_sactmin)
100
+
left_join(nine, by = "park_date") |>
101
+
drop_na(wait_minutes_actual_avg)
102
102
```
103
103
104
104
# Your Turn 1
105
105
106
-
For the parametric G-formula, we'll use a single model to fit a causal model of Posted Waiting Times (`avg_spostmin`) on Actual Waiting Times (`avg_sactmin`) where we include all covariates, much as we normally fit regression models. However, instead of interpreting the coefficients, we'll calculate the estimate by predicting on cloned data sets.
106
+
For the parametric G-formula, we'll use a single model to fit a causal model of Posted Waiting Times (`wait_minutes_posted_avg`) on Actual Waiting Times (`wait_minutes_actual_avg`) where we include all covariates, much as we normally fit regression models. However, instead of interpreting the coefficients, we'll calculate the estimate by predicting on cloned data sets.
107
107
108
-
Two additional differences in our model: we'll use a natural cubic spline on the exposure, `avg_spostmin`, using `ns()` from the splines package, and we'll include an interaction term between `avg_spostmin` and `extra_magic_mornin g`. These complicate the interpretation of the coefficient of the model in normal regression but have virtually no downside (as long as we have a reasonable sample size) in g-computation, because we still get an easily interpretable result.
108
+
Two additional differences in our model: we'll use a natural cubic spline on the exposure, `wait_minutes_posted_avg`, using `ns()` from the splines package, and we'll include an interaction term between `wait_minutes_posted_avg` and `park_extra_magic_morning`. These complicate the interpretation of the coefficient of the model in normal regression but have virtually no downside (as long as we have a reasonable sample size) in g-computation, because we still get an easily interpretable result.
Now that we've fit a model, we need to clone our data set. To do this, we'll simply mutate it so that in one set, all participants have `avg_spostmin` set to 30 minutes and in another, all participants have `avg_spostmin` set to 60 minutes.
124
+
Now that we've fit a model, we need to clone our data set. To do this, we'll simply mutate it so that in one set, all participants have `wait_minutes_posted_avg` set to 30 minutes and in another, all participants have `wait_minutes_posted_avg` set to 60 minutes.
125
125
126
126
1. Create the cloned data sets, called `thirty` and `sixty`.
127
127
2. For both data sets, use `standardized_model` and `augment()` to get the predicted values. Use the `newdata` argument in `augment()` with the relevant cloned data set. Then, select only the fitted value. Rename `.fitted` to either `thirty_posted_minutes` or `sixty_posted_minutes` (use the pattern `select(new_name = old_name)`).
Copy file name to clipboardExpand all lines: exercises/14-bonus-continuous-pscores-exercises.qmd
+12-12Lines changed: 12 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ library(propensity)
13
13
14
14
For this set of exercises, we'll use propensity scores for continuous exposures.
15
15
16
-
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`avg_spostmin`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`avg_sactmin`) at 9 am? Here’s our DAG:
16
+
In the touringplans data set, we have information about the posted waiting times for rides. We also have a limited amount of data on the observed, actual times. The question that we will consider is this: Do posted wait times (`wait_minutes_posted_avg`) for the Seven Dwarves Mine Train at 8 am affect actual wait times (`wait_minutes_actual_avg`) at 9 am? Here’s our DAG:
17
17
18
18
```{r}
19
19
#| echo: false
@@ -83,31 +83,31 @@ dagify(
83
83
)
84
84
```
85
85
86
-
First, let’s wrangle our data to address our question: do posted wait times at 8 affect actual weight times at 9? We’ll join the baseline data (all covariates and posted wait time at 8) with the outcome (average actual time). We also have a lot of missingness for `avg_sactmin`, so we’ll drop unobserved values for now.
86
+
First, let’s wrangle our data to address our question: do posted wait times at 8 affect actual weight times at 9? We’ll join the baseline data (all covariates and posted wait time at 8) with the outcome (average actual time). We also have a lot of missingness for `wait_minutes_actual_avg`, so we’ll drop unobserved values for now.
87
87
88
88
You don't need to update any code here, so just run this.
89
89
90
90
```{r}
91
91
eight <- seven_dwarfs_train_2018 |>
92
-
filter(hour == 8) |>
93
-
select(-avg_sactmin)
92
+
filter(wait_hour == 8) |>
93
+
select(-wait_minutes_actual_avg)
94
94
95
95
nine <- seven_dwarfs_train_2018 |>
96
-
filter(hour == 9) |>
97
-
select(date, avg_sactmin)
96
+
filter(wait_hour == 9) |>
97
+
select(park_date, wait_minutes_actual_avg)
98
98
99
99
wait_times <- eight |>
100
-
left_join(nine, by = "date") |>
101
-
drop_na(avg_sactmin)
100
+
left_join(nine, by = "park_date") |>
101
+
drop_na(wait_minutes_actual_avg)
102
102
```
103
103
104
104
# Your Turn 1
105
105
106
-
First, let’s calculate the propensity score model, which will be the denominator in our stabilized weights (more to come on that soon). We’ll fit a model using `lm()` for `avg_spostmin` with our covariates, then use the fitted predictions of `avg_spostmin` (`.fitted`, `.sigma`) to calculate the density using `dnorm()`.
106
+
First, let’s calculate the propensity score model, which will be the denominator in our stabilized weights (more to come on that soon). We’ll fit a model using `lm()` for `wait_minutes_posted_avg` with our covariates, then use the fitted predictions of `wait_minutes_posted_avg` (`.fitted`, `.sigma`) to calculate the density using `dnorm()`.
107
107
108
-
1. Fit a model using `lm()` with `avg_spostmin` as the outcome and the confounders identified in the DAG.
108
+
1. Fit a model using `lm()` with `wait_minutes_posted_avg` as the outcome and the confounders identified in the DAG.
109
109
2. Use `augment()` to add model predictions to the data frame.
110
-
3. In `wt_ate()`, calculate the weights using `avg_postmin`, `.fitted`, and `.sigma`.
110
+
3. In `wt_ate()`, calculate the weights using `wait_minutes_posted_avg`, `.fitted`, and `.sigma`.
111
111
112
112
```{r}
113
113
post_time_model <- lm(
@@ -169,7 +169,7 @@ Now, let's fit the outcome model!
169
169
```{r}
170
170
lm(___ ~ ___, weights = ___, data = wait_times_swts) |>
### Create a function called `ipw_fit` that fits the propensity score model and the weighted outcome model for the effect between `extra_magic_morning` and `avg_spostmin`
146
+
### Create a function called `ipw_fit` that fits the propensity score model and the weighted outcome model for the effect between `park_extra_magic_morning` and `wait_minutes_posted_avg`
147
147
148
148
### Using the `bootstraps()` and `int_t()` functions to estimate the final effect.
0 commit comments