Skip to content

Commit 8b8b640

Browse files
authored
Clarify diagnostics vignette (#1502)
1 parent a7861b4 commit 8b8b640

File tree

1 file changed

+5
-16
lines changed

1 file changed

+5
-16
lines changed

r-package/grf/vignettes/diagnostics.Rmd

Lines changed: 5 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -64,27 +64,16 @@ The forest summary function [test_calibration](https://grf-labs.github.io/grf/re
6464
test_calibration(cf)
6565
```
6666

67-
Another heuristic for testing for heterogeneity involves grouping observations into a high and low CATE group, then estimating average treatment effects in each subgroup. The function [average_treatment_effect](https://grf-labs.github.io/grf/reference/average_treatment_effect.html) estimates ATEs using a double robust approach:
67+
This exercise and function is motivated by earlier developments in the econometrics literature. A more intuitive exercise is to look at subgroup ATEs where the subgroups are formed according to low or high CATE predictions (Athey & Wager, 2019).
68+
While this approach may give some qualitative insight into heterogeneity, the grouping is naive, because the doubly robust scores used to determine subgroups are not independent of the scores used to estimate those group ATEs.
6869

69-
```{r}
70-
tau.hat <- predict(cf)$predictions
71-
high.effect <- tau.hat > median(tau.hat)
72-
ate.high <- average_treatment_effect(cf, subset = high.effect)
73-
ate.low <- average_treatment_effect(cf, subset = !high.effect)
74-
```
75-
76-
Which gives the following 95% confidence interval for the difference in ATE
77-
78-
```{r}
79-
ate.high[["estimate"]] - ate.low[["estimate"]] +
80-
c(-1, 1) * qnorm(0.975) * sqrt(ate.high[["std.err"]]^2 + ate.low[["std.err"]]^2)
81-
```
82-
83-
For another way to assess heterogeneity, see the function [rank_average_treatment_effect](https://grf-labs.github.io/grf/reference/rank_average_treatment_effect.html) and the accompanying [vignette](https://grf-labs.github.io/grf/articles/rate.html).
70+
The [RATE](https://grf-labs.github.io/grf/reference/rank_average_treatment_effect.html) function automates this exercise over all possible subgroups using the quantiles of the CATE predictions. If we use separate data to fit CATE models and estimate RATE metrics, we obtain a test statistic with expectation zero under no heterogeneity, which can be used to construct confidence intervals for the presence of treatment effect heterogeneity. For more details on this preferred approach, please see [this vignette](https://grf-labs.github.io/grf/articles/rate.html).
8471

8572
Athey et al. (2017) suggests a bias measure to gauge how much work the propensity and outcome models have to do to get an unbiased estimate, relative to looking at a simple difference-in-means: $bias(x) = (e(x) - p) \times (p(\mu(0, x) - \mu_0) + (1 - p) (\mu(1, x) - \mu_1)$.
8673

8774
```{r}
75+
tau.hat <- predict(cf)$predictions
76+
8877
p <- mean(W)
8978
Y.hat.0 <- cf$Y.hat - e.hat * tau.hat
9079
Y.hat.1 <- cf$Y.hat + (1 - e.hat) * tau.hat

0 commit comments

Comments
 (0)