How to do decision curve analysis on multiple imputation datasets? #17

YiJuChou · 2024-02-26T15:50:15Z

YiJuChou
Feb 26, 2024

Hi
I am currently working on research about validation of prediction model.
I have successfully use R to perform a decision curve analysis.
However, there is another dataset which includes some missing values.
Therefore, if I want to perform DCA on the dataset, I will have to do multiple imputation first.
I know this will generate few imputed datasets.
There are solutions to pool the results of other analyses such as multivariate logistic regression.
But when it comes to DCA, I couldn't find any solution to this.
Will be great if anybody has a hint.
Thanks a lot.

Answered by shaunporwal

Feb 27, 2024

@YiJuChou

Given Dr. Vickers' answer, you can approach it this way code-wise:

Load necessary libraries

library(mice) # For multiple imputation
library(dcurves) # For decision curve analysis
library(dplyr) # For data manipulation
library(ggplot2) # For plotting

Simulating a dataset with missing values

set.seed(123) # For reproducibility
data <- tibble(
patientid = 1:100,
cancer = rbinom(100, 1, 0.3),
risk_group = sample(c("low", "intermediate", "high"), 100, replace = TRUE),
age = rnorm(100, mean = 65, sd = 10),
famhistory = sample(c(NA, 1, 0), 100, replace = TRUE),
marker = runif(100),
cancerpredmarker = runif(100, 0, 0.6)
)

Adjust 'm' as needed for the number of imputations

num_imputation…

View full answer

shaunporwal · 2024-02-27T02:41:43Z

shaunporwal
Feb 27, 2024
Maintainer

Hi Yi-Ju,

Thank you for your question. You can do it in this order:

Calculate the DCA dataframe object for each of the imputed datasets.
Combine (average) the DCA object risk score columns from all of the multiple imputation DCA datasets. This should give you a dataframe with a net benefit estimate at each threshold.
Plot the results.

1 reply

YiJuChou Feb 27, 2024
Author

First, thank you for your kindly reply.
But I am still a little beat confused about how to do the above three steps.
If there are explanations on technical aspect will be helpful.

VickersA · 2024-02-27T16:43:39Z

VickersA
Feb 27, 2024
Maintainer

A simple alternative is just to create one large data set from the imputed data sets and run the decision curve on that. Because we aren't interested in 95%CI, you don't have to worry about Rubin's rules for combining data etc etc

1 reply

shaunporwal Feb 27, 2024
Maintainer

@YiJuChou

Given Dr. Vickers' answer, you can approach it this way code-wise:

Load necessary libraries

library(mice) # For multiple imputation
library(dcurves) # For decision curve analysis
library(dplyr) # For data manipulation
library(ggplot2) # For plotting

Simulating a dataset with missing values

set.seed(123) # For reproducibility
data <- tibble(
patientid = 1:100,
cancer = rbinom(100, 1, 0.3),
risk_group = sample(c("low", "intermediate", "high"), 100, replace = TRUE),
age = rnorm(100, mean = 65, sd = 10),
famhistory = sample(c(NA, 1, 0), 100, replace = TRUE),
marker = runif(100),
cancerpredmarker = runif(100, 0, 0.6)
)

Adjust 'm' as needed for the number of imputations

num_imputations <- 5

E.g., Impute missing values using 'mice' package, mice = 'Multivariate Imputation by Chained Equations'

mice_data <- mice(data, m = num_imputations, method = 'pmm', maxit = 5, print = FALSE)

Create one large dataset from all imputed datasets using the dynamic 'm'

all_imputed_data <- do.call(rbind, lapply(1:num_imputations, function(i) complete(mice_data, action = i)))

Now perform DCA on this combined dataset

dca(cancer ~ cancerpredmarker + famhistory,
data = all_imputed_data,
thresholds = seq(0, 0.35, by = 0.01)) %>%
plot(smooth = TRUE)

Answer selected by YiJuChou

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do decision curve analysis on multiple imputation datasets? #17

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to do decision curve analysis on multiple imputation datasets? #17

Uh oh!

YiJuChou Feb 26, 2024

Load necessary libraries

Simulating a dataset with missing values

Adjust 'm' as needed for the number of imputations

Replies: 2 comments · 2 replies

Uh oh!

shaunporwal Feb 27, 2024 Maintainer

Uh oh!

YiJuChou Feb 27, 2024 Author

Uh oh!

VickersA Feb 27, 2024 Maintainer

Uh oh!

Uh oh!

shaunporwal Feb 27, 2024 Maintainer

Load necessary libraries

Simulating a dataset with missing values

Adjust 'm' as needed for the number of imputations

E.g., Impute missing values using 'mice' package, mice = 'Multivariate Imputation by Chained Equations'

Create one large dataset from all imputed datasets using the dynamic 'm'

Now perform DCA on this combined dataset

YiJuChou
Feb 26, 2024

Replies: 2 comments 2 replies

shaunporwal
Feb 27, 2024
Maintainer

YiJuChou Feb 27, 2024
Author

VickersA
Feb 27, 2024
Maintainer

shaunporwal Feb 27, 2024
Maintainer