The model will take the various diagnostic results (like hormone level, blood pressure etc.) and patient's conditions (like age, BMI, lineage factor) as input features and predict whether she has polydipsia. The model will be trained using Gaussian Naive Bayes Classifier algorithm.
Dataset contains 768 rows and 8 columns. Dataset can be found here
Features are:
'Pregnancies', 'BloodPressure', 'HormoneLevel', HumulinLevel', 'BMI', 'Age', 'LineageFactor', 'Prediction'
- Importing libraries
- Reading dataset
- Data analysis
- Checking for null values
- Correlation matrix
- Distribution of features
- Model building
- Class for
Gaussian Naive Bayes Classifierwith functions for training and testing
- Class for
- k-fold validation
- Training and testing of model
- Confusion matrix and score calculation
- Feature transformation using various functions
- skewness identificaion
- log(1+x) transformation
- Quadratic transformation
- Training and testing of model on transformed data
- Comparing with
sklearnlibrary function
- Install Jupyter Notebook or use Google Colab.
- Open the file
Gaussian Naive Bayes.ipynbin Jupyter Notebook or Google Colab. - Run all the cells to perform data analysis, cleaning, model building and testing.
- Pandas
- Numpy
- Matplotlib
- Seaborn
- Scikit-learn (To compare our model with library function)
- math
Detailed report can be found here