Skip to content

didarbilgin/Unsupervised-Retail-Segmentation-API-DBSCAN

Repository files navigation

Unsupervised Retail Segmentation API (DBSCAN)

This project provides a modular FastAPI-based machine learning service that performs unsupervised clustering on retail data (Northwind-style) for products, suppliers, and customers using the DBSCAN algorithm. It includes a complete pipeline for data collection, preprocessing, model training, visualization, and API-based interaction.


🚀 Features

  • Automated clustering using DBSCAN for multiple entities:
    • Products (Problem 2)
    • Suppliers (Problem 3)
    • Customers by country (Problem 4)
  • Dynamic preprocessing pipeline for feature scaling and data cleaning
  • Automatic parameter optimization via silhouette score and elbow method (KneeLocator)
  • Visualization support for clusters and k-distance plots
  • FastAPI endpoints for model training and CSV download
  • Persistent model storage with joblib
  • Extensible modular design — easy to adapt for other datasets or clustering methods

🧩 Tech Stack

  • Backend: FastAPI (Python 3.10+)
  • Machine Learning: scikit-learn (DBSCAN, StandardScaler, silhouette analysis)
  • Data Handling: pandas, SQLAlchemy, PostgreSQL
  • Visualization: Matplotlib, kneed
  • Environment & Tools: joblib, python-dotenv, Docker-ready structure

⚙️ Project Structure

.
├── app.py                    # FastAPI application with endpoints (/train, /download)
├── database.py               # PostgreSQL connection and data collection via SQLAlchemy
├── db_connection.py          # Environment-based DB config (dotenv)
├── preprocessing.py          # Data preprocessing and feature engineering
├── training.py               # DBSCAN training and parameter optimization
├── visualization.py          # Cluster and eps visualization
├── models/                   # Trained model storage (pkl files)
├── outputs/                  # Cluster results and generated plots
└── .env                      # Database credentials

🧠 API Endpoints

1️⃣ Train Models

Train and cluster entities using DBSCAN.

POST /train/problem_2 → Product clustering
POST /train/problem_3 → Supplier clustering
POST /train/problem_4 → Customer-country clustering

Each endpoint returns a success message and the generated CSV path.

2️⃣ Download Cluster Results

GET /download/problem_2
GET /download/problem_3
GET /download/problem_4

Each route provides a downloadable CSV file of the clustered output.


🔧 How It Works

  1. Data Collection – Fetches tables from a PostgreSQL database using SQLAlchemy.
  2. Preprocessing – Handles missing values, normalization, and feature scaling via StandardScaler.
  3. Training – Runs DBSCAN with optimized parameters (epsilon and min_samples) determined via silhouette score and knee detection.
  4. Visualization – Saves k-distance and cluster distribution plots under the outputs/ directory.
  5. API Interaction – Use FastAPI to trigger clustering and download results.

🧪 Example Usage

Train a clustering model

curl -X POST http://127.0.0.1:8000/train/problem_2

Response:

{
  "message": "Problem 2 model trained successfully.",
  "file": "outputs/dbscan_clustered_products_problem_2.csv"
}

Download the result

curl -O http://127.0.0.1:8000/download/problem_2

📊 Visualization Samples

During training, DBSCAN automatically generates the following plots:

  • *_eps_plot.png → elbow curve for optimal epsilon
  • *_clusters_plot.png → visual cluster separation for the selected entity

All visualizations are saved under the outputs/ folder.


🧰 Setup & Run

1️⃣ Clone the repository

git clone https://github.com/yourusername/retail-segmentation-api.git
cd retail-segmentation-api

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Configure environment

Create a .env file with your database credentials:

DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=northwind

4️⃣ Run the API

uvicorn app:app --reload

Access it at: 👉 http://127.0.0.1:8000/docs


🧩 Requirements

fastapi
uvicorn
pandas
numpy
scikit-learn
sqlalchemy
matplotlib
kneed
python-dotenv
joblib
psycopg2

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages