Skip to content

This repository is made for students who are working on the Heart Failure Survival Analysis data science project leading by Sina Bonakdar and Terry Zhang

License

Notifications You must be signed in to change notification settings

MichiganDataScienceTeam/W26-MDST-Project_Heart-Failure-Survival-Analysis

Repository files navigation

Heart Failure Survival Analysis, Winter 2026

Survival analysis of heart failure patients to identify key factors that distinguish survival from death. Students will learn visualization, statistical analysis, and machine learning techniques to predict patient outcomes.

Key Finding: Two features are sufficient to distinguish survival from death using different classifiers.

View Project Website

Structure

Below is a high-level overview of the main components of this project.

Dataset {heart_failure_clinical_records_dataset.csv}
Heart failure clinical records from 299 patients with 13 features including age, ejection fraction, serum creatinine, and follow-up time. Binary outcome: survival or death.

Week 1: Data Exploration {Week1.ipynb}
Introduction to the dataset, exploratory data analysis, and visualization techniques using Pandas, Seaborn, and Matplotlib.

Week 2: Statistical Analysis {Week2.ipynb}
Hypothesis testing (T-test, Mann-Whitney U), correlation analysis, multiple testing correction (FDR), feature variance analysis, and Variance Inflation Factor (VIF) for detecting multicollinearity.

Week 3: Unsupervised Learning {Week3.ipynb}
Dimensionality reduction with PCA, clustering techniques, and visualizing high-dimensional data.

Schedule

Week Topic Links
1 Data Exploration Notebook, Seaborn Docs, Pandas Docs
2 Statistical Analysis Notebook, Slides, Scipy Stats, Statsmodels VIF
3 Unsupervised Learning Notebook, PCA Guide, Clustering

Research Background

This project is based on the paper by Chicco & Jurman (2020):

Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone BMC Medical Informatics and Decision Making, 20, 16

Key Findings from the Paper:

  • Applied several ML classifiers (Random Forest, Gradient Boosting, SVM, etc.) to predict survival
  • Discovered that serum creatinine and ejection fraction alone achieve strong predictive performance
  • Random Forest achieved the best results with Matthews Correlation Coefficient (MCC) of 0.418
  • Feature ranking analysis revealed time, serum creatinine, and ejection fraction as top predictors
  • Demonstrated that complex models with all 13 features do not significantly outperform simpler 2-feature models

Clinical Relevance:

  • Serum creatinine indicates kidney function, often impaired in heart failure patients
  • Ejection fraction measures heart pumping efficiency, a direct indicator of cardiac health
  • These two biomarkers are routinely measured and can guide clinical decision-making

Read the full paper

Resources

Getting Started

git clone https://github.com/MichiganDataScienceTeam/W26-MDST-Project_Heart-Failure-Survival-Analysis.git
cd W26-MDST-Project_Heart-Failure-Survival-Analysis
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
jupyter notebook

Acknowledgements

Leads
Sina Bonakdar
Terry Zhang

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This repository is made for students who are working on the Heart Failure Survival Analysis data science project leading by Sina Bonakdar and Terry Zhang

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published