This project aims to predict ICU patient mortality using the PhysioNet 2012 Challenge dataset. We develop machine learning models based on irregularly sampled multivariate time-series data, capturing patient vitals and static attributes from the first 48 hours of ICU stay. The goal is to predict whether a patient survives or dies in the ICU.
For details, refer to our 📄 Project Report.
You can also find the 📝 Project Handout.
We achieved a grade of 5.9/6 for this project.
- Source: PhysioNet 2012 Challenge
- Data: First 48 hours of ICU stay
- 37 dynamic variables: Vital signs, lab test results, etc.
- 4 static variables: Age, Gender, Height, Weight
- Target: Binary classification (Discharged Alive = 0, Deceased = 1)
-
Clone the Repository:
git clone https://github.com/Thosam1/PhysioNet-ICU-Mortality-Prediction.git cd PhysioNet-ICU-Mortality-Prediction -
Set Up the Environment:
- Have Python 3.8 or higher installed.
- Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
-
Prepare the Dataset:
- Download the dataset from PhysioNet 2012 Challenge.
- Unzip the dataset and place its contents in the
data/folder.
-
Run the Notebooks:
- Navigate to the
notebooks/folder. - Execute the Jupyter notebooks in order:
1_data_parsing.ipynb2_data_preprocessing.ipynb3_model_training.ipynb4_evaluation_and_visualization.ipynb
- Navigate to the
Some cells in the notebooks generate files required by subsequent notebooks. If you need to recompute the dataset for model training, uncomment the section in the first notebook labeled with the comment:
# Only need to run once
This will regenerate the .parquet files used in the later notebooks. Alternatively, you can use the pre-generated .parquet files provided in the repository. The same approach applies to the .pkl files containing the precomputed embeddings. For using these, please unzip the outcomes.zip, set.zip and embeddings.zip.
Please notice that the non_agg_{dataset}_embeddings.zip were sent separately from the rest of the files in the submission as we were limited by the submission maximum file size. We included these as the non aggregated embedddings take a very long time to compute. To be able to use these files, unzip them into the root directory of the project.
PhysioNet-ICU-Mortality-Prediction/
├── data/ # Raw data from challenge .zip
├── data_parsing/ # Scripts for parsing raw data
├── data_preprocessing/ # Scripts for cleaning and preprocessing data
├── models/ # Machine learning and deep learning models
├── notebooks/ # Jupyter notebooks for all experiments and analysis
├── utils/ # Utility functions
├── visualization/ # Scripts for generating plots and visualizations
├── requirements.txt # Python dependencies
└── README.md # Project documentation