-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Hello GRF Lab team,
Thank you for developing such a fantastic package. I have a question regarding sample splitting when estimating a policy tree using k-fold cross-validation and conducting welfare analysis.
I am following the tutorial in section "6.2 Parametric policies" at the link below to conduct welfare analysis using the following steps:
(Reference: https://bookdown.org/stanfordgsbsilab/ml-ci-tutorial/policy-learning-i---binary-treatment.html)
Step 1: Estimate the CATE using 10-fold CV on 100% of the data, and compute DR scores.
Step 2: Use the DR scores from Step 1 to estimate a policy tree using 10-fold CV on 100% of the data, and output the optimal policy.
Step 3: Calculate the welfare of the optimal policy from Step 2 using 100% of the data.
My questions are as follows:
-
In this setup, even though I use 10-fold CV to estimate the policy tree, should I still split the data between policy tree estimation and welfare calculation? (For example, estimate CATE using 10-fold CV on 100% of the data, estimate the policy tree using 10-fold CV on one half of the data, and calculate welfare on the other half.)
-
Should I also split the data between CATE estimation and policy tree estimation? (For instance, estimate CATE using 10-fold CV on 50% of the data, estimate the policy tree using 10-fold CV on 30%, and compute welfare on the remaining 20%.)
If such modifications are necessary, I would appreciate it if you could briefly explain your reasoning.