This project aims to analyze the causes of mortality in the United States from 2005 to 2015 by examining the CDC’s National Vital Statistics System dataset. The project focuses on the seven leading causes of death, which are cancer, heart disease, respiratory illness, stroke, accidents, diabetes, and old age (with Alzheimer’s and Parkinson’s).
The dataset used in this project consists of 22 files containing demographic and cause-of-death data from 2005 to 2015. The data was obtained from the CDC’s National Vital Statistics System dataset.
The project turns the dataset into a regression dataset to perform time series forecasting of mortality using XgBoost and other models. We evaluated the models using various evaluation measures to obtain the best-performing model.
The project provides significant insights into the causes of mortality in the United States. The time series forecasting of mortality can help improve public health outcomes, encourage innovation, and support evidence-based decision-making.
To run this project, clone this repository and install the required dependencies using the requirements.txt file. Run the mortality_analysis.py script to perform the analysis.
- Python 3.6 or higher
- Pandas
- Scikit-learn
- XgBoost
This project is licensed under the MIT License - see the LICENSE.md file for details.