Skip to content

On Sample Splitting for Welfare Analysis Following PolicyTree Estimation with K-Fold CV #176

@j-kawamu

Description

@j-kawamu

Hello GRF Lab team,
Thank you for developing such a fantastic package. I have a question regarding sample splitting when estimating a policy tree using k-fold cross-validation and conducting welfare analysis.

I am following the tutorial in section "6.2 Parametric policies" at the link below to conduct welfare analysis using the following steps:
(Reference: https://bookdown.org/stanfordgsbsilab/ml-ci-tutorial/policy-learning-i---binary-treatment.html)

Step 1: Estimate the CATE using 10-fold CV on 100% of the data, and compute DR scores.
Step 2: Use the DR scores from Step 1 to estimate a policy tree using 10-fold CV on 100% of the data, and output the optimal policy.
Step 3: Calculate the welfare of the optimal policy from Step 2 using 100% of the data.

My questions are as follows:

  • In this setup, even though I use 10-fold CV to estimate the policy tree, should I still split the data between policy tree estimation and welfare calculation? (For example, estimate CATE using 10-fold CV on 100% of the data, estimate the policy tree using 10-fold CV on one half of the data, and calculate welfare on the other half.)

  • Should I also split the data between CATE estimation and policy tree estimation? (For instance, estimate CATE using 10-fold CV on 50% of the data, estimate the policy tree using 10-fold CV on 30%, and compute welfare on the remaining 20%.)

If such modifications are necessary, I would appreciate it if you could briefly explain your reasoning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions