GitHub - RC-15-coder/Diabetes-Predictor

Overview:

The Diabetes Prediction Web Application is a data-driven tool designed to predict the likelihood of diabetes based on user inputs such as glucose levels, blood pressure, BMI, and other health indicators. This project leverages machine learning and data preprocessing techniques to provide accurate predictions while offering an interactive and user-friendly interface.

The goal of this project was to build a complete end-to-end solution for predicting diabetes, including a robust backend for data processing and storage, a trained ML model, and a frontend for user interaction.

Why This Project?

Diabetes is a growing health concern worldwide, and early detection plays a crucial role in its management. This project:

Provides an easy-to-use platform for individuals to assess their risk of diabetes.
Showcases the power of machine learning in solving real-world problems.

Workflow of the Diabetes Prediction Web Application:

User Interaction

User Registration/Login:
- Users start by registering an account or logging in if they already have one.
- Once logged in, users access the dashboard and the prediction form.
Input Health Metrics:
- On the prediction page, users provide their health metrics, such as:
  - Gender, Glucose levels, Blood Pressure, Skin Thickness, Insulin levels, BMI (Body Mass Index).
  - Diabetes Pedigree Function (a measure of genetic influence), Age.
- These inputs are collected via an HTML form and sent to the backend for processing.
Data Preprocessing

Standardization:
- User inputs are preprocessed using the scaler.pkl file saved during training.
- The scaler ensures all input values are scaled to match the data format used during model training.
- This is crucial because the model was trained on scaled data, and raw inputs may lead to inaccurate predictions.
Pregnancy Adjustment:
- If the user is male, the Pregnancies feature is automatically set to 0, as it’s not applicable.
Model Prediction

Loading the Model:
- The pre-trained LightGBM Classifier (best_lgb_model.pkl) is loaded into memory.
Prediction Process:
- The preprocessed inputs are fed into the model.
- The model predicts a probability of the user being diabetic.
Threshold Application:
- The predicted probability is compared against the optimal threshold (e.g., 0.16).
- If the probability exceeds the threshold, the user is classified as "Diabetic", otherwise "Non-Diabetic".
- This threshold ensures a balance between sensitivity (identifying diabetics correctly) and specificity (avoiding false positives).
Prediction Storage

Storing Results in SQLite:
- Each prediction result is stored in the backend SQLite database.
- The logged-in user's ID is associated with the prediction, ensuring personal tracking of history.
Prediction Dashboard

Viewing Prediction History:
- Users can view their past predictions on the dashboard page.
- The dashboard displays:
  - Prediction Result: Indicates whether the user was classified as "Diabetic" or "Non-Diabetic".
  - The exact date of the prediction.
Error Handling
- If inputs are invalid (e.g., missing values, non-numeric inputs), an error message is displayed.

Backend Workflow Summary

The user inputs data through the frontend form.
The data is sent to the backend via a POST request.
The backend:
- Preprocesses the data (scaling and feature adjustment).
- Uses the LightGBM model to predict diabetes probability.
- Applies the optimal threshold to determine the final result.
- Saves the result to the SQLite database for logged-in users.
The prediction result is displayed on the result page and stored for future reference.

Technologies Used

Languages and Libraries

Backend: Python, Django, SQLite
Frontend: HTML, CSS, Bootstrap
Machine Learning: LightGBM, Scikit-learn, Pandas, NumPy
Data Preprocessing: Local Outlier Factor, StandardScaler
Tools: PythonAnywhere (for hosting)

Key Files

views.py: Handles backend logic, including data preprocessing and prediction.
model_training.py: Trains the LightGBM model and saves it as a .pkl file.
data_cleaning.py: Cleans and preprocesses the dataset (diabetes.csv).

Steps to Set Up and Run the Backend

Clone the Repository:

git clone https://github.com/RC-15-coder/CINS-490.git
cd CINS-490

Create a Virtual Environment
To avoid dependency conflicts, create a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Linux/Mac
venv\Scripts\activate     # On Windows

Install Dependencies
Ensure all required packages are installed. This is achieved using the requirements.txt file:
```
pip install -r requirements.txt
```
Run the Website from the Backend (if PythonAnywhere is down or not working):
Start the Django development server:
```
python manage.py runserver
```
This will output a URL on the console like:
```
Starting development server at http://127.0.0.1:8000/
```
Optional Steps to Run Data Cleaning and Model Training Scripts:

If you want to regenerate the scaler.pkl and best_lgb_model.pkl files:
- Data Cleaning:
```
python data_cleaning.py
```
- Model Training:
```
python model_training.py
```
These scripts will generate the required preprocessed data and model files in the appropriate directories.
Accessing the Website:
- After running python manage.py runserver, open the following URL in your web browser:
```
http://127.0.0.1:8000/
```

To see the live demo:

https://raghavchandna.pythonanywhere.com/

For Testing the Results on the Website:

Already Tested Users:

Username: Jay
Password: jay@123456
Username: Rachel
Password: rachel@123456

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
diabetes_predictor_website		diabetes_predictor_website
predictor		predictor
.gitignore		.gitignore
README.md		README.md
X_test.csv		X_test.csv
X_train.csv		X_train.csv
data_cleaning.py		data_cleaning.py
db.sqlite3		db.sqlite3
diabetes.csv		diabetes.csv
manage.py		manage.py
model_training.py		model_training.py
requirements.txt		requirements.txt
y_test.csv		y_test.csv
y_train.csv		y_train.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

About

Uh oh!

Releases

Packages

Languages

RC-15-coder/Diabetes-Predictor

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages