Survival analysis of heart failure patients to identify key factors that distinguish survival from death. Students will learn visualization, statistical analysis, and machine learning techniques to predict patient outcomes.
Key Finding: Two features are sufficient to distinguish survival from death using different classifiers.
Below is a high-level overview of the main components of this project.
Dataset {heart_failure_clinical_records_dataset.csv}
Heart failure clinical records from 299 patients with 13 features including age, ejection fraction, serum creatinine, and follow-up time. Binary outcome: survival or death.
Week 1: Data Exploration {Week1.ipynb}
Introduction to the dataset, exploratory data analysis, and visualization techniques using Pandas, Seaborn, and Matplotlib.
Week 2: Statistical Analysis {Week2.ipynb}
Hypothesis testing (T-test, Mann-Whitney U), correlation analysis, multiple testing correction (FDR), feature variance analysis, and Variance Inflation Factor (VIF) for detecting multicollinearity.
Week 3: Unsupervised Learning {Week3.ipynb}
Dimensionality reduction with PCA, clustering techniques, and visualizing high-dimensional data.
| Week | Topic | Links |
|---|---|---|
| 1 | Data Exploration | Notebook, Seaborn Docs, Pandas Docs |
| 2 | Statistical Analysis | Notebook, Slides, Scipy Stats, Statsmodels VIF |
| 3 | Unsupervised Learning | Notebook, PCA Guide, Clustering |
This project is based on the paper by Chicco & Jurman (2020):
Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone BMC Medical Informatics and Decision Making, 20, 16
Key Findings from the Paper:
- Applied several ML classifiers (Random Forest, Gradient Boosting, SVM, etc.) to predict survival
- Discovered that serum creatinine and ejection fraction alone achieve strong predictive performance
- Random Forest achieved the best results with Matthews Correlation Coefficient (MCC) of 0.418
- Feature ranking analysis revealed time, serum creatinine, and ejection fraction as top predictors
- Demonstrated that complex models with all 13 features do not significantly outperform simpler 2-feature models
Clinical Relevance:
- Serum creatinine indicates kidney function, often impaired in heart failure patients
- Ejection fraction measures heart pumping efficiency, a direct indicator of cardiac health
- These two biomarkers are routinely measured and can guide clinical decision-making
git clone https://github.com/MichiganDataScienceTeam/W26-MDST-Project_Heart-Failure-Survival-Analysis.git
cd W26-MDST-Project_Heart-Failure-Survival-Analysis
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
jupyter notebookLeads
Sina Bonakdar
Terry Zhang
This project is licensed under the MIT License - see the LICENSE file for details.