Skip to content

Commit 371fda2

Browse files
authored
Release v.1.2.0
SurvSHAP(t) calculation using {treeshap} and fixes for SurvLIME
2 parents d6ee933 + 06bf634 commit 371fda2

29 files changed

+286
-114
lines changed

DESCRIPTION

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
Package: survex
22
Title: Explainable Machine Learning in Survival Analysis
3-
Version: 1.1.3.9000
3+
Version: 1.2.0
44
Authors@R:
55
c(
66
person("Mikołaj", "Spytek", email = "[email protected]", role = c("aut", "cre"), comment = c(ORCID = "0000-0001-7111-2286")),
77
person("Mateusz", "Krzyziński", role = c("aut"), comment = c(ORCID = "0000-0001-6143-488X")),
88
person("Sophie", "Langbein", role = c("aut")),
99
person("Hubert", "Baniecki", role = c("aut"), comment = c(ORCID = "0000-0001-6661-5364")),
10+
person("Lorenz A.", "Kapsner", role = c("ctb"), comment = c(ORCID = "0000-0003-1866-860X")),
1011
person("Przemyslaw", "Biecek", role = c("aut"), comment = c(ORCID = "0000-0001-8423-1823"))
1112
)
1213
Description: Survival analysis models are commonly used in medicine and other areas. Many of them
@@ -46,6 +47,7 @@ Suggests:
4647
rmarkdown,
4748
rms,
4849
testthat (>= 3.0.0),
50+
treeshap (>= 0.3.0),
4951
withr,
5052
xgboost
5153
Config/testthat/edition: 3

NEWS.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
1-
# survex (development version)
1+
# survex 1.2.0
2+
* added new `calculation_method` for `surv_shap()` called `"treeshap"` that uses the `treeshap` package ([#75](https://github.com/ModelOriented/survex/issues/75))
3+
* enable to calculate SurvSHAP(t) explanations based on subsample of the explainer's data
4+
* changed default kernel width in SurvLIME from sqrt(p * 0.75) to sqrt(p) * 0.75
5+
* fixed error in SurvLIME when non-factor `categorical_variables` were provided
26

37
# survex 1.1.3
48

R/metrics.R

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ utils::globalVariables(c("PredictionSurv"))
1212
#' @return a function that can be used to calculate metrics (with parameters `y_true`, `risk`, `surv`, and `times`)
1313
#'
1414
#' @section References:
15-
#' - \[1\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
15+
#' - \[1\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
1616
#'
1717
#' @export
1818
loss_integrate <- function(loss_function, ..., normalization = NULL, max_quantile = 1) {
@@ -57,7 +57,7 @@ loss_integrate <- function(loss_function, ..., normalization = NULL, max_quantil
5757
#' @return numeric from 0 to 1, higher values indicate better performance
5858
#'
5959
#' @section References:
60-
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
60+
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
6161
#'
6262
#' @rdname c_index
6363
#' @seealso [loss_one_minus_c_index()]
@@ -109,7 +109,7 @@ attr(c_index, "loss_type") <- "risk-based"
109109
#' @return numeric from 0 to 1, lower values indicate better performance
110110
#'
111111
#' @section References:
112-
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
112+
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
113113
#'
114114
#' @rdname loss_one_minus_c_index
115115
#' @seealso [c_index()]
@@ -152,8 +152,8 @@ attr(loss_one_minus_c_index, "loss_type") <- "risk-based"
152152
#' @return numeric from 0 to 1, lower scores are better (Brier score of 0.25 represents a model which returns always returns 0.5 as the predicted survival function)
153153
#'
154154
#' @section References:
155-
#' - \[1\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
156-
#' - \[2\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19990915/30)18:17/18%3C2529::AID-SIM274%3E3.0.CO;2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
155+
#' - \[1\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
156+
#' - \[2\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
157157
#'
158158
#' @rdname brier_score
159159
#' @seealso [cd_auc()]
@@ -217,8 +217,8 @@ attr(loss_brier_score, "loss_type") <- "time-dependent"
217217
#' Calculate Cumulative/Dynamic AUC
218218
#'
219219
#' This function calculates the Cumulative/Dynamic AUC metric for a survival model. It is done using the
220-
#' estimator proposed proposed by Uno et al. \[[1](https://www.jstor.org/stable/27639883)\],
221-
#' and Hung and Chang \[[2](https://www.jstor.org/stable/41000414)\].
220+
#' estimator proposed proposed by Uno et al. \[1\],
221+
#' and Hung and Chang \[2\].
222222
#'
223223
#' C/D AUC is an extension of the AUC metric known from classification models.
224224
#' Its values represent the model's performance at specific time points.
@@ -232,8 +232,8 @@ attr(loss_brier_score, "loss_type") <- "time-dependent"
232232
#' @return a numeric vector of length equal to the length of the times vector, each value (from the range from 0 to 1) represents the AUC metric at a specific time point, with higher values indicating better performance.
233233
#'
234234
#' @section References:
235-
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
236-
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
235+
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
236+
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
237237
#'
238238
#' @rdname cd_auc
239239
#' @seealso [loss_one_minus_cd_auc()] [integrated_cd_auc()] [brier_score()]
@@ -297,8 +297,8 @@ attr(cd_auc, "loss_type") <- "time-dependent"
297297
#' @return a numeric vector of length equal to the length of the times vector, each value (from the range from 0 to 1) represents 1 - AUC metric at a specific time point, with lower values indicating better performance.
298298
#'
299299
#' #' @section References:
300-
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
301-
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
300+
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
301+
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
302302
#'
303303
#' @rdname loss_one_minus_cd_auc
304304
#' @seealso [cd_auc()]
@@ -337,8 +337,8 @@ attr(loss_one_minus_cd_auc, "loss_type") <- "time-dependent"
337337
#' @return numeric from 0 to 1, higher values indicate better performance
338338
#'
339339
#' #' @section References:
340-
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
341-
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
340+
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
341+
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
342342
#'
343343
#' @rdname integrated_cd_auc
344344
#' @seealso [cd_auc()] [loss_one_minus_cd_auc()]
@@ -373,8 +373,8 @@ attr(integrated_cd_auc, "loss_type") <- "integrated"
373373
#' @return numeric from 0 to 1, lower values indicate better performance
374374
#'
375375
#' #' @section References:
376-
#' - \[1\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883) Journal of the American Statistical Association 102.478 (2007): 527-537.
377-
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. ["Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data."](https://www.jstor.org/stable/41000414) Scandinavian Journal of Statistics 37.4 (2010): 664-679.
376+
#' - \[1\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
377+
#' - \[2\] Hung, Hung, and Chin‐Tsang Chiang. "Optimal composite markers for time‐dependent receiver operating characteristic curves with censored survival data." Scandinavian Journal of Statistics 37.4 (2010): 664-679.
378378
#'
379379
#' @rdname loss_one_minus_integrated_cd_auc
380380
#' @seealso [integrated_cd_auc()] [cd_auc()] [loss_one_minus_cd_auc()]
@@ -417,8 +417,8 @@ attr(loss_one_minus_integrated_cd_auc, "loss_type") <- "integrated"
417417
#' @return numeric from 0 to 1, lower values indicate better performance
418418
#'
419419
#' @section References:
420-
#' - \[1\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
421-
#' - \[2\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/10.1002/(SICI)1097-0258(19990915/30)18:17/18%3C2529::AID-SIM274%3E3.0.CO;2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
420+
#' - \[1\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
421+
#' - \[2\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
422422
#'
423423
#' @rdname integrated_brier_score
424424
#' @seealso [brier_score()] [integrated_cd_auc()] [loss_one_minus_integrated_cd_auc()]
@@ -458,6 +458,7 @@ attr(loss_integrated_brier_score, "loss_type") <- "integrated"
458458
#'
459459
#' @return a function with standardized parameters (`y_true`, `risk`, `surv`, `times`) that can be used to calculate loss
460460
#'
461+
#' @examples
461462
#' if(FALSE){
462463
#' measure <- msr("surv.calib_beta")
463464
#' mlr_measure <- loss_adapt_mlr3proba(measure)
@@ -483,7 +484,6 @@ loss_adapt_mlr3proba <- function(measure, reverse = FALSE, ...) {
483484

484485
return(output)
485486
}
486-
487487
if (reverse) {
488488
attr(loss_function, "loss_name") <- paste("one minus", measure$id)
489489
} else {

R/model_performance.R

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,17 @@
99
#' @param times a numeric vector of times. If `type == "metrics"` then the survival function is evaluated at these times, if `type == "roc"` then the ROC curves are calculated at these times.
1010
#'
1111
#' @return An object of class `"model_performance_survival"`. It's a list of metric values calculated for the model. It contains:
12-
#' - Harrell's concordance index \[[1](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780030207)\]
13-
#' - Brier score \[[2](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml), [3](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5)\]
14-
#' - C/D AUC using the estimator proposed by Uno et. al \[[4](https://www.jstor.org/stable/27639883#metadata_info_tab_contents)\]
12+
#' - Harrell's concordance index \[1\]
13+
#' - Brier score \[2, 3\]
14+
#' - C/D AUC using the estimator proposed by Uno et. al \[4\]
1515
#' - integral of the Brier score
1616
#' - integral of the C/D AUC
1717
#'
1818
#' @section References:
19-
#' - \[1\] Harrell, F.E., Jr., et al. ["Regression modelling strategies for improved prognostic prediction."](https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.4780030207) Statistics in Medicine 3.2 (1984): 143-152.
20-
#' - \[2\] Brier, Glenn W. ["Verification of forecasts expressed in terms of probability."](https://journals.ametsoc.org/view/journals/mwre/78/1/1520-0493_1950_078_0001_vofeit_2_0_co_2.xml) Monthly Weather Review 78.1 (1950): 1-3.
21-
#' - \[3\] Graf, Erika, et al. ["Assessment and comparison of prognostic classification schemes for survival data."](https://onlinelibrary.wiley.com/doi/abs/10.1002/%28SICI%291097-0258%2819990915/30%2918%3A17/18%3C2529%3A%3AAID-SIM274%3E3.0.CO%3B2-5) Statistics in Medicine 18.17‐18 (1999): 2529-2545.
22-
#' - \[4\] Uno, Hajime, et al. ["Evaluating prediction rules for t-year survivors with censored regression models."](https://www.jstor.org/stable/27639883#metadata_info_tab_contents) Journal of the American Statistical Association 102.478 (2007): 527-537.
19+
#' - \[1\] Harrell, F.E., Jr., et al. "Regression modelling strategies for improved prognostic prediction." Statistics in Medicine 3.2 (1984): 143-152.
20+
#' - \[2\] Brier, Glenn W. "Verification of forecasts expressed in terms of probability." Monthly Weather Review 78.1 (1950): 1-3.
21+
#' - \[3\] Graf, Erika, et al. "Assessment and comparison of prognostic classification schemes for survival data." Statistics in Medicine 18.17‐18 (1999): 2529-2545.
22+
#' - \[4\] Uno, Hajime, et al. "Evaluating prediction rules for t-year survivors with censored regression models." Journal of the American Statistical Association 102.478 (2007): 527-537.
2323
#'
2424
#' @examples
2525
#' \donttest{

R/model_survshap.R

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ model_survshap <- function(explainer, ...) {
5959
model_survshap.surv_explainer <- function(explainer,
6060
new_observation = NULL,
6161
y_true = NULL,
62+
N = NULL,
6263
calculation_method = "kernelshap",
6364
aggregation_method = "integral",
6465
output_type = "survival",
@@ -98,9 +99,11 @@ model_survshap.surv_explainer <- function(explainer,
9899
explainer = explainer,
99100
new_observation = observations,
100101
output_type = output_type,
102+
N = N,
101103
y_true = y_true,
102104
calculation_method = calculation_method,
103-
aggregation_method = aggregation_method
105+
aggregation_method = aggregation_method,
106+
...
104107
)
105108

106109
attr(shap_values, "label") <- explainer$label

R/plot_model_profile_survival.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ plot2_mp <- function(x,
230230
if (!is.null(subtitle) && subtitle == "default") {
231231
subtitle <- paste0("created for the ", unique(variable), " variable")
232232
if (single_timepoint && !marginalize_over_time) {
233-
subtitle <- paste0(subtitle, " and time =", times)
233+
subtitle <- paste0(subtitle, " and time = ", times)
234234
}
235235
}
236236

R/plot_predict_profile_survival.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ plot2_cp <- function(x,
192192
if (!is.null(subtitle) && subtitle == "default") {
193193
subtitle <- paste0("created for the ", unique(variable), " variable")
194194
if (single_timepoint && !marginalize_over_time) {
195-
subtitle <- paste0(subtitle, " and time =", times)
195+
subtitle <- paste0(subtitle, " and time = ", times)
196196
}
197197
}
198198

R/plot_surv_shap.R

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -121,11 +121,11 @@ plot.surv_shap <- function(x,
121121
#' * `color_variable` - variable used to denote the color, by default equal to `variable`
122122
#'
123123
#'
124-
#'#' ## `plot.aggregated_surv_shap(geom = "curves")`
124+
#' ## `plot.aggregated_surv_shap(geom = "curves")`
125125
#'
126126
#' * `variable` - variable for which SurvSHAP(t) curves are to be plotted, by default first from result data
127127
#' * `boxplot` - whether to plot functional boxplot with marked outliers or all curves colored by variable value
128-
#'
128+
#' * `coef` - length of the functional boxplot's whiskers as multiple of IQR, by default 1.5
129129
#'
130130
#' @examples
131131
#' \donttest{
@@ -293,7 +293,7 @@ plot_shap_global_beeswarm <- function(x,
293293
max_vars = 7,
294294
colors = NULL) {
295295
df <- as.data.frame(do.call(rbind, x$aggregate))
296-
cols <- names(sort(colMeans(abs(df))))[1:min(max_vars, length(df))]
296+
cols <- names(sort(colMeans(abs(df)), decreasing = TRUE))[1:min(max_vars, length(df))]
297297
df <- df[, cols]
298298
df <- stack(df)
299299
colnames(df) <- c("shap_value", "variable")
@@ -325,6 +325,7 @@ plot_shap_global_beeswarm <- function(x,
325325
ggplot(data = df, aes(x = shap_value, y = variable, color = var_value)) +
326326
geom_vline(xintercept = 0, color = "#ceced9", linetype = "solid") +
327327
geom_jitter(width = 0, height = 0.15) +
328+
scale_y_discrete(limits=rev) +
328329
scale_color_gradient2(
329330
name = "Variable value",
330331
low = colors[1],

0 commit comments

Comments
 (0)