Skip to content

The results of this package do not match the results of the test data with causal forest. How to explain it? #57

@zhangkaicr

Description

@zhangkaicr

First of all, thank you for providing this package for heterogeneity analysis of treatment effects. I am currently studying this aspect.
Through learning, I understand the relevant content of causal forest. Referring to the tutorial content at the following website, there should be treatment effect heterogeneity in the indo_rct dataset.
https://www.r-bloggers.com/2024/09/explaining-a-causal-forest/

Image Image Image

Considering that the test data is an RCT study, I set the code according to your study, but the ABC values ​​I got are all negative, which seems to be contrary to the results of the causal forest. I hope you can answer this question. Thank you.

`
library(medicaldata)
library(tidyverse)
library(precmed)

data("indo_rct")
head(indo_rct)

indo_rct$rx <- as.integer(indo_rct$rx)-1
indo_rct$outcome <- as.integer(indo_rct$outcome)-1

indo_rct <-
indo_rct |>
mutate_if(is.factor, function(v) as.integer(v) - 1L) |>
rename(male = gender)

使用线性回归分析治疗(trt)对结果(y)的影响

output_lm <- glm(outcome ~ rx, family = "binomial", data = indo_rct)

打印线性回归结果

output_lm

output_atefit <- atefit(response = "count",
data = indo_rct,
cate.model = outcome ~ age + male + pep + recpanc + type + difcan + psphinc + bsphinc + pdstent + train,
ps.model = rx ~ 1,
n.boot = 500,
seed = 999,
verbose = 1)

打印ATE估计结果

output_atefit

计算率比(rate ratio)

rate.ratio <- exp(output_atefit$log.rate.ratio$estimate)
rate.ratio

计算率比的95%置信区间

CI.rate.ratio <- exp(output_atefit$log.rate.ratio$estimate + c(-1, 1) * qnorm(0.975) * sqrt(output_atefit$log.rate.ratio$SE))
CI.rate.ratio

绘制ATE结果图

plot(output_atefit)

使用catecv函数进行交叉验证

output_catecv <- catecv(
response = "count", # 响应变量类型为计数
data = indo_rct, # 数据集
# 指定CATE评分方法
score.method = c("poisson", "contrastReg","boosting"),
# 指定CATE模型
cate.model = outcome ~ age + male + pep + recpanc + type + difcan + psphinc + bsphinc + pdstent + train,
ps.model = rx ~ 1, # 倾向性评分模型
initial.predictor.method = "poisson", # 初始预测方法
higher.y = FALSE, # 是否使用较高的y值
cv.n = 5, # 交叉验证折数
seed = 999, # 随机数种子
plot.gbmperf = FALSE, # 是否绘制GBM性能图
verbose = 1 # 显示进度信息
)

计算曲线间面积(ABC)

output_abc <- abc(x = output_catecv)
output_abc

计算每种方法的平均ABC

average_abc <- apply(output_abc, 1, mean)
average_abc`

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions