Model comparison for in hospital mortality prediction and survival analysis based on MIMIC III

Introduction

This is the code repository for BMI 6319 course project in Spring 2021.

The overarching goal of this research:

To compare the results of different models on different inputs, then reach out that if add demographics information will improve the results of in hospital mortality/survival analysis for different types of models. After completing this project, we will know which model works better on EHR data especially on MIMIC III for predicting in hospital mortality/ survival analysis and if demographics information has an obvious effect on the accuracy of prediction. In this way, it will be helpful for future studies on model/input selection.

Subgoals are:

Descriptive analysis on MIMIC III dataset with plots;
Use MIMIC III dataset to compare different models such as the statistics model Logistic Regression with machine learning models Light Gradient Boosting Machine and Recurrent Neural Network on in hospital mortality prediction based on (1) diagnosis, prescription and procedure, (2) diagnosis, prescription, procedure and demographics.
Use MIMIC III dataset to compare different machine learning models such as Random Survival Forest and Recurrent Neural Network for survival analysis based on (1) diagnosis, prescription and procedure, (2) diagnosis, prescription, procedure and demographics.

The dataset used:

MIMIC III with ADMISSIONS, DIAGNOSES_ICD, ICUSTAYS, PATIENTS, PRESCRIPTIONS and PROCEDURES tables.

I'll not provide the MIMIC III data itself, you need to acquire the data yourself from https://mimic.physionet.org/.

Steps to run this project

You may choose to start from the Model part directly with data files after data pre-process, or you can also start from Data pre-process with the raw data.

Data pre-process

The DescriptiveAnalysis.ipynb is the draft of descriptive analysis from the raw MIMIC III dataset.
The DataPreprocess.py is the first part of data pre-process, it will output two types of csv file for in hospital mortality prediction and survival analysis: case&control files contain diagnosis, procedures and prescriptions; case&control files contain diagnosis, prescriptions, procedures and demographics. For each type we will have two files: one is case contains died patients' information and the other is contorl contains other patients' information.
The preprocessing.py is the second part of data pre-process, it will output three types of files (train, valid and test) and in these files, we have a list of lists. Every list represents a patient and in this list, we will have his/her ID, other lists with different visits and for every time stamp we have diagnosis, prescription, etc.

Model

For in hospital mortality task, you can run Mortality_dp.ipynb which contains the three models for in hospital mortality prediction with diagnoses, procedures & prescriptions information; Mortality_dpd.ipynb which contains the three models with diagnoses, prescriptions, procedures & demographics information. The data used in these scripts can be found in the data floder.
For in survival analysis task, you can run Survival_dp.ipynb which contains the two models for survival analysis with diagnoses, procedures & prescriptions information; Survival_dpd.ipynb which contains the two models with diagnoses, prescriptions, procedures & demographics information. The data used in these scripts can be found in the data floder.
The script model.py contains all the required functions for RNN model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Model comparison for in hospital mortality prediction and survival analysis based on MIMIC III

Introduction

The overarching goal of this research:

The dataset used:

Steps to run this project

Data pre-process

Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
DataPreprocess.py		DataPreprocess.py
DescriptiveAnalysis.ipynb		DescriptiveAnalysis.ipynb
Mortality_dp.ipynb		Mortality_dp.ipynb
Mortality_dpd.ipynb		Mortality_dpd.ipynb
README.md		README.md
Survival_dp.ipynb		Survival_dp.ipynb
Survival_dpd.ipynb		Survival_dpd.ipynb
model.py		model.py
preprocessing.py		preprocessing.py

BingyuMao/model_comparison_mimic

Folders and files

Latest commit

History

Repository files navigation

Model comparison for in hospital mortality prediction and survival analysis based on MIMIC III

Introduction

The overarching goal of this research:

The dataset used:

Steps to run this project

Data pre-process

Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages