This project implements logistic regression from scratch using NumPy to predict diabetes and 10-year risk of heart disease based on patient health data.
- File:
diabData.csv - Columns include:
male– Gender (0 = Female, 1 = Male)age– Age in yearseducation– Education levelcurrentSmoker– Whether currently smoking (0/1)cigsPerDay– Number of cigarettes per dayBPMeds– On blood pressure medication (0/1)prevalentStroke– History of stroke (0/1)prevalentHyp– History of hypertension (0/1)diabetes– Diabetes status (0/1)totChol– Total cholesterolsysBP– Systolic blood pressurediaBP– Diastolic blood pressureBMI– Body mass indexheartRate– Heart rateglucose– Glucose levelTenYearCHD– 10-year risk of coronary heart disease (0/1)
- Implemented gradient descent manually to train logistic regression.
- Computes weights and bias without using sklearn’s LogisticRegression.
- Includes:
- Sigmoid function
- Gradient calculation
- Accuracy computation
- Predicting for new user input
- Clone or download this repository.
- Open
diabetes.ipynbin Jupyter Notebook. - Run all cells in order:
- Load and scale data
- Train the model
- Test accuracy
- Predict new input