Missing data error for survival outcomes in lmtp_curve

**Describe the issue**
For survival outcomes, there seems to be a bug in _lmtp_curve_() that results in a missing data error. I believe this is because the predict_long() step does not restrict survival data to those who are "at risk". 

**reprex**
A simulated longitudinal survival dataset is available [here](https://drive.google.com/drive/folders/11eQkC3dYEvD8MahcSvaR4EEeLQnd-iWB?usp=sharing).
```
devtools::load_all("lmtp")

# simulated data
data <- read.csv("sim.csv")

out <- lmtp_curve(
  data = data,
  trt = c("A_1", "A_2"),
  outcome = c("Y_1", "Y_2"),
  baseline = c("W1", "W2"),
  time_vary = list(c("L1_1", "L2_1"), c("L1_2", "L2_2")),
  cens = c("C_1", "C_2"),
  compete = c("D_1", "D_2"),
  shift = static_binary_on,
  shifted = NULL,
  k = Inf,
  mtp = FALSE,
  outcome_type = "survival",
  folds = 1,
  control = lmtp_control()
)

Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..A_1', '..i..L_1', '..i..L_2', but learner 'classif.log_reg' does not support this
```


**Below is the original code for predict_long**
```
predict_long <- function(fit, newdata, t, tau) {
  # Create indicators for the subset of rows to use based on time
  time <- as.numeric(newdata$time) <= rev(1:tau)[t]
  # Empty matrix to store predictions
  predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)
  # Indicator for not having been censored at the previous time point
  is_observed <- newdata$..i..C_1_lag == 1

  if (isTRUE(task$survival)) {
    # Indicator for not experiencing the outcome already
    outcome_free <- newdata$..i..N[time] == 1
    # Indicator for not experiencing competing risk already
    competing_risk_free <- newdata$..i..D_1[time] == 0
  } else {
    outcome_free <- rep(TRUE, nrow(newdata[time, ]))
    competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
  }

  predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
  predictions[!outcome_free, 1] <- 0
  predictions[!competing_risk_free, 1] <- 1
  predictions[, 1]
}

```

**Updated code for predict_long**
Changing this as such removes this missing data error (note, I have also added "task" as an argument - I am not sure how scoping works for S6 objects but this was necessary for it to work for me):
```
predict_long <- function(task, fit, newdata, t, tau) {
  # Create indicators for the subset of rows to use based on time
  time <- as.numeric(newdata$time) <= rev(1:tau)[t]
  # Empty matrix to store predictions
  predictions <- matrix(nrow = nrow(newdata[time, ]), ncol = 1)

  if (isTRUE(task$survival)) {
    # Indicator for not experiencing the outcome already
    outcome_free <- newdata$..i..N[time] == 1
    # Indicator for not experiencing competing risk already
    competing_risk_free <- newdata$..i..D_1[time] == 0
    # Indicator for not having been censored at the previous time point or had event or competing event 
    is_observed <- newdata$..i..C_1_lag == 1 & outcome_free & competing_risk_free
    is_observed[is.na(is_observed)] <- FALSE
  } else {
    outcome_free <- rep(TRUE, nrow(newdata[time, ]))
    competing_risk_free <- rep(TRUE, nrow(newdata[time, ]))
    # Indicator for not having been censored at the previous time point
    is_observed <- newdata$..i..C_1_lag == 1
  }

  predictions[is_observed[time], 1] <- predict(fit, newdata[time & is_observed, ], NULL)
  predictions[!outcome_free, 1] <- 0
  predictions[!competing_risk_free, 1] <- 1
  predictions[, 1]
}
```
But this results in another error downstream that I haven't been able to trace. 

```
Error: Task 'mlr3superlearner_training_task' has missing values in column(s) '..i..Y_1', but learner 'regr.lm' does not support this
```

It might be related to eif() but I'm not sure. It seems that the output of eif() is not aligned with the prediction matrices (there are NA values where there shouldn't be). 

Tagging @herbps10 who I have discussed this with by email. 

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing data error for survival outcomes in lmtp_curve #186

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Missing data error for survival outcomes in lmtp_curve #186

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions