Skip to content

viochris/Diabetes-prediction-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

35 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺 Diabetes Prediction: Research & Deployment System

Python Streamlit Scikit-Learn Status

πŸ“Œ Overview

This repository serves a dual purpose, bridging the gap between academic research replication and practical software deployment:

  1. Research Phase (.ipynb): A strict replication of a specific medical study on diabetes prediction using Random Forest, SMOTE, and Global Scaling.
  2. Deployment Phase (.py): A modular web application built with Streamlit that serves the trained model to end-users in a user-friendly interface.

⚠️ NOTE: This is the Academic Replication version which intentionally contains data leakage to reproduce a paper's results. For the industry-standard, robust version (Leakage-Free & Fine-Tuned), please visit: πŸ‘‰ Diabetes Prediction Fine-Tuned Project


⚠️ Academic Methodology Disclaimer

Please Read Before Reviewing the Research Notebook:

To strictly adhere to the cited reference paper's methodology and reproduce their reported metrics, the Jupyter Notebook (ml-prediction-diabetic-code.ipynb) follows a specific preprocessing workflow:

  • Global Scaling: MinMaxScaler is applied to the entire dataset before splitting.
  • Global SMOTE: Oversampling is applied to the entire dataset before splitting.

Methodology Note: I am fully aware that applying these techniques before the Train-Test split introduces Data Leakage and is not standard industry practice. However, this was done intentionally to reproduce the exact results reported in the academic paper.


πŸ“‚ Project Architecture

This project moves beyond a simple notebook by implementing a Modular Architecture for deployment. The logic is separated into distinct responsibilities:

β”œβ”€β”€ app/                                   # πŸ’» APPLICATION SOURCE CODE
β”‚   β”œβ”€β”€ app.py                             # Main Streamlit application
β”‚   β”œβ”€β”€ model.py                           # Backend logic & inference
β”‚   └── preprocess.py                      # Utils for input formatting
β”œβ”€β”€ assets/                                # πŸ–ΌοΈ STATIC ASSETS
β”‚   └── diabetes_app_ui.png                # App Screenshot
β”œβ”€β”€ models/                                # πŸ“¦ ARTIFACTS (Serialized Objects)
β”‚   β”œβ”€β”€ scaler.joblib                      # Saved MinMaxScaler
β”‚   └── model_rf.joblib                    # Saved Random Forest Model
β”œβ”€β”€ notebooks/                             # πŸ”¬ RESEARCH & EXPERIMENTATION
β”‚   └── ml-prediction-diabetic-code.ipynb  # Replicated Research Study
└── requirements.txt                       # Dependency list

πŸ› οΈ Tech Stack

  • Frontend Framework: Streamlit
  • Machine Learning Core: Scikit-Learn (Random Forest Classifier)
  • Data Handling: Pandas, NumPy, Joblib
  • Imbalanced Data: Imbalanced-learn (SMOTE)
  • Environment: Python 3.9+

πŸš€ How to Run Locally

1. Clone the Repository

git clone https://github.com/viochris/Diabetes-prediction-project.git
cd Diabetes-prediction-project

2. Install Dependencies

Make sure you have the required libraries installed:

pip install -r requirements.txt

3. Run the Streamlit App

Execute the main application file from the root directory:

streamlit run app/app.py

4. Access the Dashboard

Open your browser and navigate to: http://localhost:8501

πŸ“Š Research Results

Based on the replication study conducted in the notebook (ml-prediction-diabetic-code.ipynb) using Random Forest with Global SMOTE:

  • Algorithm: Random Forest Classifier
  • Accuracy: ~83.67%
  • Precision: ~84%
  • Recall: ~84%

(Metrics are inflated due to the intentional data leakage required for paper replication)

πŸ“· Screenshots

Web Application Interface

User-friendly interface built with Streamlit allowing for real-time patient data input and instant prediction:

Diabetes App UI


Author: Silvio Christian, Joe "Bridging the gap between Academic Research and Practical Deployment."

About

πŸ” Predict diabetes using ML + Streamlit β€” Full pipeline with dashboard & joblib model!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors