Heart Failure Classification

About

In this analysis, we explored various classification models with the intent of predicting whether a patient is at risk of heart failure based on clinical data and lifestyle factors of individuals. After evaluating multiple models through cross-validation, we selected Logistic Regression as our final model due to its overall superior performance across classification metrics. The model demonstrated promising results on the unseen test set, with an accuracy of 86% and F1-scores of 0.88 for the positive class (at risk) and 0.84 for the negative class (not at risk). From the 276 observations in the test set, the model correctly identified 144 cases at risk and 97 not at risk, reporting 23 false positives and 12 false negatives (cases predicted as not at risk when there is risk). Although the scores are encouraging for a first iteration, there is room for improvement to optimize the hyperparameters and the model's threshold settings to minimize false negative cases, which are critical in medical applications. Overall, this model shows potential to support clinical professionals in the assessment of patients during screening.

The dataset used in this project is pulled from a repository of the University of Minho, Portugal. The dataset was created by Federico Soriano Palacios (2021), it integrates five different heart-related datasets combined over 11 common features that can be used to predict a possible heart disease. The five data sets are part of the “Heart Disease” dataset (Janosi et al., 1989) that can be found in the UCI Machine Learning Repository that is originally sourced from the Hungarian Institute of Cardiology, the University Hospital of Zurich, the University Hospital of Basel, the V.A. Medical Center of Long Beach and Cleveland Clinic Foundation. Each row of the dataset contains 11 attributes that describe the patient’s age, sex, chest pain type, resting blood pressure, serum cholesterol, fasting blood sugar, resting ECG result, maximum heart rate achieved, exercise induced angina, ST depression induced by exercise relative to rest, slope of the peak exercise ST segment, and the presence or absence of heart disease.

Report

The final report can be found here report

Dependencies

All library dependencies are specified in environment.yml for conda or conda-lock.yml for reproducible builds.

Usage

Clone repository

git clone https://github.com/EricYangg/Heart-Failure-Classification.git
cd Heart-Failure-Classification

Environment Setup

To reproduce the analysis you can use one of the three following options:

Option 1: Use conda-lock

Install the conda lock from the root of this repository:

conda-lock install --name heart-failure-classification conda-lock.yml

Switch to the project's environment by running the following line from the terminal:

conda activate heart-failure-classification

Option 2: Use environment.yml

conda env create -f environment.yml
conda heart-failure-classification

Option 3: Use docker

Install and launch Docker on your computer.
Run the Docker container: Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

docker-compose up

Access Jupyter Lab: In the terminal, look for a URL that starts with http://127.0.0.1:8888/lab?token= (for an example, see the highlighted text in the terminal below). Copy and paste that URL into your browser to open Jupyter Notebooks.

Running the analysis

Option 1: Use Makefile

Clean the project: Open a terminal, navigate to the root of this project, and run:

make clean

This removes any previously generated files to start with a clean environment.

Run the analysis:

make all

This builds the project and run the entire analysis workflow. The Makefile defines the complete analysis pipeline and execution order of all scripts. Users can review it to understand how each stage of the workflow (data processing, modeling, and reporting) fits together.

Option 2: Run scripts manually

To run the analysis manually, open a terminal and run the following commands:

python scripts/01_download_data.py \
--url="https://epl.di.uminho.pt/~jcr/AULAS/ATP2021/datasets/heart.csv" \
--write_to=data/raw

python scripts/02_validate_n_split.py \
--logs-to=logs \
--raw-data=data/raw/heart.csv \
--data-to=data/validated \
--seed=123

python scripts/03_eda_validate.py \
--training-data=data/validated/heart_train.csv \
--test-data=data/validated/heart_test.csv \
--plot-to=results/figures \
--data-to=data/validated

python scripts/04_preprocessor.py \
--training-data=data/validated/heart_train.csv \
--preprocessor-to=results/models \
--seed=123

python scripts/05_fit_heart_disease_model.py \
--x-train-data=data/validated/X_train.csv \
--y-train-data=data/validated/y_train.csv \
--x-test-data=data/validated/X_test.csv \
--y-test-data=data/validated/y_test.csv \
--preprocessor=results/models/heart_preprocessor.pickle \
--pipeline-to=results/models \
--results-to=results/tables \
--figures-to=results/figures \
--seed=123 \
--cv-folds=5

quarto render reports/heart_disease_analysis.qmd --to html
quarto render reports/heart_disease_analysis.qmd --to pdf

Running the function tests

To verify that each of the functions work appropriately, function tests are written in python scripts. To run these tests go to the root project directory in the terminal and write the following command:

pytest tests/

Clean up

To shut down the container and clean up the resources, press Ctrl + C in the terminal where the container is running, and then run docker compose rm

Contributors

Omar Ramos

Affiliation: University of British Columbia
Email: omar.ramos19@gmail.com
GitHub: @mayitoxix

Mara Sánchez

Affiliation: University of British Columbia
Email: marasanchezrom@gmail.com
GitHub: @mara-sanchez1

Eric Yang

Affiliation: University of British Columbia
Email: eric99yang@gmail.com
GitHub: @EricYangg

Acknowledgements

The reproducible data science workflow implemented in this project was greatly inspired by Dr. Tiffany Timbers's DSCI 522 course.

License

The Heart Failure Classification project are licensed under the Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License. Check out the license file for more information. If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. Check out the license file for more information.

References

Dua, Dheeru, and Casey Graff. 2017. “UCI Machine Learning Repository.” University of California, Irvine, School of Information; Computer Sciences. http://archive.ics.uci.edu/ml.

Janosi, A., Steinbrunn, W., Pfisterer, M., & Detrano, R. (1989). Heart Disease [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C52P4X.

Savarese, G., Lund, L. H., & Becher, P. M. (2023). Global burden of heart failure: A comprehensive and updated review of epidemiology. Cardiovascular Research, 118(17), 3272–3287. https://doi.org/10.1093/cvr/cvac013https://pubmed.ncbi.nlm.nih.gov/35150240/

Barnett, M. P., Koppes, L. L. J., & … [et al.]. (2020). Cardiovascular risk factors: It’s time to focus on variability! Frontiers in Cardiovascular Medicine, 7, Article 80. <https://doi.org/10.3389/fcvm.2020.00080(PMC published version) https://pmc.ncbi.nlm.nih.gov/articles/PMC7379092/>

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.github/workflows		.github/workflows
data		data
img		img
logs		logs
reports		reports
results		results
scripts		scripts
src		src
tests		tests
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
conda-lock.yml		conda-lock.yml
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Failure Classification

About

Report

Dependencies

Usage

Clone repository

Environment Setup

Running the analysis

Running the function tests

Clean up

Contributors

Omar Ramos

Mara Sánchez

Eric Yang

Acknowledgements

License

References

About

Uh oh!

Releases 4

Packages

Contributors 4

Uh oh!

Languages

License

EricYangg/Heart-Failure-Classification

Folders and files

Latest commit

History

Repository files navigation

Heart Failure Classification

About

Report

Dependencies

Usage

Clone repository

Environment Setup

Running the analysis

Running the function tests

Clean up

Contributors

Omar Ramos

Mara Sánchez

Eric Yang

Acknowledgements

License

References

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Uh oh!

Languages

Packages