You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Data science and health is a field that increasingly merges as time goes on. This project is a glimpse at the capability machine learning models have in predicting stroke risk. The files in this repo contain the work, modules, and report that walks through the data science pipeline, resulting in a classification model used to predist stroke risk with a 74% recall.
Goals
The project aims to create a model that identifies individuals with a high risk of stroke based on stroke data.
Initial Questions
What does stroke look like in the dataset?
Is there a relationship between stroke and age?
Is there a relationship between stroke and gender?
Is there a relatio nship between blood sugar level and stroke?
Plan
Acquire data
Prepare, clean, & split data
Explore the data to find drivers and answer intital questions
Create a model
Evaluate
Conclude with recommendations and next steps
Data Dictionary
Feature
Definition
id
unique identifier
gender
"Male", "Female" or "Other"
age
age of the patient
hypertension
0 if the patient doesn't have hypertension, 1 if the patient has hypertension
heart_disease
0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease
ever_married
"No" or "Yes"
work_type
"children", "Govt_jov", "Never_worked", "Private" or "Self-employed"
Residence_type
"Rural" or "Urban"
avg_glucose_level
average glucose level in blood
bmi
body mass index
smoking_status
"formerly smoked", "never smoked", "smokes" or "Unknown"