Skip to content

predict(type = "survival") output lacks .weight_censored column required by roc_auc_survival() #350

@abichat

Description

@abichat

Hello,

yardstick::roc_auc_survival() requires survival-probability predictions that include a column named .weight_censored. parsnip::augment() adds this column automatically, but predict(..., type = "survival") returns only .eval_time and .pred_survival, which causes roc_auc_survival() to error.

library(dplyr)
library(parsnip)
library(yardstick)
library(survival)
library(censored)

lung$surv <- Surv(lung$time, lung$status)

# pipeline works well for concordance_survival
proportional_hazards() %>% 
  fit(surv ~ ph.ecog + age, data = lung) %>% 
  predict(lung, type = "time") %>% 
  bind_cols(lung) %>% 
  concordance_survival(surv, .pred_time)
#> # A tibble: 1 × 3
#>   .metric              .estimator .estimate
#>   <chr>                <chr>          <dbl>
#> 1 concordance_survival standard       0.610

# error for roc_auc_survival
proportional_hazards() %>% 
  fit(surv ~ ph.ecog + age, data = lung) %>% 
  predict(lung, type = "survival", eval_time = c(100, 200)) %>% 
  bind_cols(lung) %>% 
  roc_auc_survival(surv, .pred)
#> Error in `roc_auc_survival()`:
#> ! All data.frames of `estimate` should include column names:
#>   (.eval_time), (.pred_survival), and (.weight_censored).

# workaround with augment
proportional_hazards() %>% 
  fit(surv ~ ph.ecog + age, data = lung) %>% 
  augment(lung, type = "survival", eval_time = c(100, 200)) %>% 
  roc_auc_survival(surv, .pred)
#> # A tibble: 2 × 4
#>   .metric          .estimator .eval_time .estimate
#>   <chr>            <chr>           <dbl>     <dbl>
#> 1 roc_auc_survival standard          100     0.626
#> 2 roc_auc_survival standard          200     0.644

lapply(c("parsnip", "yardstick", "survival"), packageVersion)
#> [[1]]
#> [1] '1.3.3'
#> 
#> [[2]]
#> [1] '1.3.2'
#> 
#> [[3]]
#> [1] '3.8.3'

Created on 2025-09-03 with reprex v2.1.1

I am not sure whether this behavior is intended, but it makes the pipeline difficult to reuse consistently across different metrics. If it is intentional, would it be possible to introduce an argument that, when set to TRUE, automatically adds the relevant column?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    tidy-dev-day 🤓Tidyverse Developer Day rstd.io/tidy-dev-day

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions