Skip to content

Missing data error for survival outcomes in lmtp_curve #186

@Lachlan-Cribb

Description

@Lachlan-Cribb

Describe the issue
For survival outcomes, there seems to be a bug in lmtp_curve() that results in a missing data error. I believe this is because the predict_long() step does not restrict survival data to those who are "at risk".

reprex
A simulated longitudinal survival dataset is available here.

devtools::load_all("lmtp")

# simulated data
data <- read.csv("sim.csv")

out <- lmtp_curve(
  data = data,
  trt = c("A_1", "A_2"),
  outcome = c("Y_1", "Y_2"),
  baseline = c("W1", "W2"),
  time_vary = list(c("L1_1", "L2_1"), c("L1_2", "L2_2")),
  cens = c("C_1", "C_2"),
  compete = c("D_1", "D_2"),
  shift = static_binary_on,
  shifted = NULL,
  k = Inf,
  mtp = FALSE,
  outcome_type = "survival",
  folds = 1,
  control = lmtp_control()
)

Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..A_1', '..i..L_1', '..i..L_2', but learner 'classif.log_reg' does not support this

Below is the original code for predict_long

predict_long <- function(fit, newdata, t, tau) {
  # Create indicators for the subset of rows to use based on time
  time <- as.numeric(newdata$time) <= rev(1:tau)[t]
  # Empty matrix to store predictions
  predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)
  # Indicator for not having been censored at the previous time point
  is_observed <- newdata$..i..C_1_lag == 1

  if (isTRUE(task$survival)) {
    # Indicator for not experiencing the outcome already
    outcome_free <- newdata$..i..N[time] == 1
    # Indicator for not experiencing competing risk already
    competing_risk_free <- newdata$..i..D_1[time] == 0
  } else {
    outcome_free <- rep(TRUE, nrow(newdata[time, ]))
    competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
  }

  predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
  predictions[!outcome_free, 1] <- 0
  predictions[!competing_risk_free, 1] <- 1
  predictions[, 1]
}

Updated code for predict_long
Changing this as such removes this missing data error (note, I have also added "task" as an argument - I am not sure how scoping works for S6 objects but this was necessary for it to work for me):

predict_long <- function(task, fit, newdata, t, tau) {
  # Create indicators for the subset of rows to use based on time
  time <- as.numeric(newdata$time) <= rev(1:tau)[t]
  # Empty matrix to store predictions
  predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)

  if (isTRUE(task$survival)) {
    # Indicator for not experiencing the outcome already
    outcome_free <- newdata$..i..N[time] == 1
    # Indicator for not experiencing competing risk already
    competing_risk_free <- newdata$..i..D_1[time] == 0
    # Indicator for not having been censored at the previous time point or had event or competing event 
    is_observed <- newdata$..i..C_1_lag == 1 & outcome_free & competing_risk_free
    is_observed[is.na(is_observed)] <- FALSE
  } else {
    outcome_free <- rep(TRUE, nrow(newdata[time, ]))
    competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
    # Indicator for not having been censored at the previous time point
    is_observed <- newdata$..i..C_1_lag == 1
  }

  predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
  predictions[!outcome_free, 1] <- 0
  predictions[!competing_risk_free, 1] <- 1
  predictions[, 1]
}

But this results in another error downstream that I haven't been able to trace.

Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..Y_1', but learner 'regr.lm' does not support this

It might be related to eif() but I'm not sure. It seems that the output of eif() is not aligned with the prediction matrices (there are NA values where there shouldn't be).

Tagging @herbps10 who I have discussed this with by email.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions