-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Describe the issue
For survival outcomes, there seems to be a bug in lmtp_curve() that results in a missing data error. I believe this is because the predict_long() step does not restrict survival data to those who are "at risk".
reprex
A simulated longitudinal survival dataset is available here.
devtools::load_all("lmtp")
# simulated data
data <- read.csv("sim.csv")
out <- lmtp_curve(
data = data,
trt = c("A_1", "A_2"),
outcome = c("Y_1", "Y_2"),
baseline = c("W1", "W2"),
time_vary = list(c("L1_1", "L2_1"), c("L1_2", "L2_2")),
cens = c("C_1", "C_2"),
compete = c("D_1", "D_2"),
shift = static_binary_on,
shifted = NULL,
k = Inf,
mtp = FALSE,
outcome_type = "survival",
folds = 1,
control = lmtp_control()
)
Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..A_1', '..i..L_1', '..i..L_2', but learner 'classif.log_reg' does not support this
Below is the original code for predict_long
predict_long <- function(fit, newdata, t, tau) {
# Create indicators for the subset of rows to use based on time
time <- as.numeric(newdata$time) <= rev(1:tau)[t]
# Empty matrix to store predictions
predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)
# Indicator for not having been censored at the previous time point
is_observed <- newdata$..i..C_1_lag == 1
if (isTRUE(task$survival)) {
# Indicator for not experiencing the outcome already
outcome_free <- newdata$..i..N[time] == 1
# Indicator for not experiencing competing risk already
competing_risk_free <- newdata$..i..D_1[time] == 0
} else {
outcome_free <- rep(TRUE, nrow(newdata[time, ]))
competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
}
predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
predictions[!outcome_free, 1] <- 0
predictions[!competing_risk_free, 1] <- 1
predictions[, 1]
}
Updated code for predict_long
Changing this as such removes this missing data error (note, I have also added "task" as an argument - I am not sure how scoping works for S6 objects but this was necessary for it to work for me):
predict_long <- function(task, fit, newdata, t, tau) {
# Create indicators for the subset of rows to use based on time
time <- as.numeric(newdata$time) <= rev(1:tau)[t]
# Empty matrix to store predictions
predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)
if (isTRUE(task$survival)) {
# Indicator for not experiencing the outcome already
outcome_free <- newdata$..i..N[time] == 1
# Indicator for not experiencing competing risk already
competing_risk_free <- newdata$..i..D_1[time] == 0
# Indicator for not having been censored at the previous time point or had event or competing event
is_observed <- newdata$..i..C_1_lag == 1 & outcome_free & competing_risk_free
is_observed[is.na(is_observed)] <- FALSE
} else {
outcome_free <- rep(TRUE, nrow(newdata[time, ]))
competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
# Indicator for not having been censored at the previous time point
is_observed <- newdata$..i..C_1_lag == 1
}
predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
predictions[!outcome_free, 1] <- 0
predictions[!competing_risk_free, 1] <- 1
predictions[, 1]
}
But this results in another error downstream that I haven't been able to trace.
Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..Y_1', but learner 'regr.lm' does not support this
It might be related to eif() but I'm not sure. It seems that the output of eif() is not aligned with the prediction matrices (there are NA values where there shouldn't be).
Tagging @herbps10 who I have discussed this with by email.
Thanks!