Repo that reproduce the modeling of routine blood trajectories in SCI from EHR data study performed by the Health.data DRIVEN lab at the School of Public Health Sciences, University of Waterloo. Code created by Drs. Marzieh Mussavi Rizi and Abel Torres Espin.
For detail, please see our publications
Peer-reviewed publication
Mussavi Rizi, M., Fernández, D., Kramer, J.L.K. et al. Modeling trajectories of routine blood tests as dynamic biomarkers for outcome in spinal cord injury. npj Digit. Med. 8, 470 (2025). https://doi.org/10.1038/s41746-025-01782-0
Pre-Print
Modeling trajectories of routine blood tests as dynamic biomarkers for outcome in spinal cord injury Marzieh Mussavi Rizi, Daniel Fernandez, John LK Kramer, Rajiv Saigal, Anthony M. DiGiorgio, Michael S. Beattie, Adam R Ferguson, Nikos Kyritsis, Abel Torres-Espin, TRACK-SCI investigators medRxiv 2025.01.20.25320728; doi: https://doi.org/10.1101/2025.01.20.2532072
blood_trajectory_analysis.Rmd: contains all the main code to reproduce our modeling and analysisfunctions.R: script with the main custom functions. It is loaded to environment byblood_trajectory_analysis.Rmdmodels_GMM: set of GMM fited models using the lcmm package. These are .Rds files the with model objectsmodels_GMM_selected: set of final selected GMM fited models using the lcmm package. These are .Rds files the with model objectsprediction_experiments: set of .Rds files with R objects (lists and dataframes) containing all information on prediction experiments (see publications)figures: output folder for figurestables: output folder for tables
We do not provide the datasets directly, and users of this code will need to download the data. Three datasets are used in this work: MIMIC-III version 1.4, MIMIC-IV version 1.0, and a subset of the TRACK-SCI cohort study. The .Rmd script contains the necessary code to prepare the data for analysis.
Data has been download from PhysioNet. Both MIMIC databases (DB) are relational DB structured in tables. Documentation about the DB schema can be found here.
Note that data access need Data Use Agreement with PhysioNet. No data is provided in this document or repository. The code would not run without the data!
The necessary TRACK-SCI data can be downloaded from the Open Data Commons for Spinal Cord Injury (SCI) here. If you use the data, please cite:
Mussavi Rizi, M., Saigal, R., DiGiorgio, A. M., Ferguson, A. R., Beattie, M. S., Kyritsis, N., Torres Espin, A.. 2025. Blood laboratory values from 137 de-identified TRACK-SCI participants from routine collected real-world data. Open Data Commons for Spinal Cord Injury. ODC-SCI:1345. doi: 10.34945/F5PK6X
Part of this work uses SAPS II values for both MIMIC datasets. If you want to reproduce our work using this code, you will need to calculate it first, and save it in a mimic_SC_saps.csv file that contains four columns: subject_id = subject identifier; hadm_id = hospital admision identifier; icustay_id = ICU stay identifier; sapsii = calculated SAPS II.
For MIMIC-III, we compute SAPS II scores for the selected cohort using SQL code publicly available on GitHub. (https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iii/concepts/severityscores/sapsii.sql). For MIMIC-IV, we used the equivalent script (https://github.com/MIT-LCP/mimic-code/blob/main/mimic-iv/concepts/score/sapsii.sql).
The code should run with the following environment. Further information can be found in the .Rmd file.
"R version 4.4.1 (2024-06-14 ucrt)", "tidyverse 2.0.0", "data.table 1.17.0", "stringr 1.5.1", "DT 0.33", "gtsummary 2.2.0", "lcmm 1.9.4", "caret 7.0-1", "yardstick 1.3.2", "patchwork 1.3.0", "parallel (base with R 4.4.1)"
Some sections of the .Rmd script are not evaluated during knitting (rendering) due to their computational overhead. We have provided intermediate files containing the necessary objects and biproducts of the code, including the final trajectory models to facilitate reproducibility. By cloning this repo, you should be able to reproduce our results without having to re-fit all the models, but you can do so too. To reproduce this work, you will need to run the code on the IDE by chunk.
Running the full script from scratch will override some of the provided files and it can take hours to days to complete.