Unsupervised Retail Segmentation API (DBSCAN)

This project provides a modular FastAPI-based machine learning service that performs unsupervised clustering on retail data (Northwind-style) for products, suppliers, and customers using the DBSCAN algorithm. It includes a complete pipeline for data collection, preprocessing, model training, visualization, and API-based interaction.

🚀 Features

Automated clustering using DBSCAN for multiple entities:
- Products (Problem 2)
- Suppliers (Problem 3)
- Customers by country (Problem 4)
Dynamic preprocessing pipeline for feature scaling and data cleaning
Automatic parameter optimization via silhouette score and elbow method (KneeLocator)
Visualization support for clusters and k-distance plots
FastAPI endpoints for model training and CSV download
Persistent model storage with joblib
Extensible modular design — easy to adapt for other datasets or clustering methods

🧩 Tech Stack

Backend: FastAPI (Python 3.10+)
Machine Learning: scikit-learn (DBSCAN, StandardScaler, silhouette analysis)
Data Handling: pandas, SQLAlchemy, PostgreSQL
Visualization: Matplotlib, kneed
Environment & Tools: joblib, python-dotenv, Docker-ready structure

⚙️ Project Structure

.
├── app.py                    # FastAPI application with endpoints (/train, /download)
├── database.py               # PostgreSQL connection and data collection via SQLAlchemy
├── db_connection.py          # Environment-based DB config (dotenv)
├── preprocessing.py          # Data preprocessing and feature engineering
├── training.py               # DBSCAN training and parameter optimization
├── visualization.py          # Cluster and eps visualization
├── models/                   # Trained model storage (pkl files)
├── outputs/                  # Cluster results and generated plots
└── .env                      # Database credentials

🧠 API Endpoints

1️⃣ Train Models

Train and cluster entities using DBSCAN.

POST /train/problem_2 → Product clustering
POST /train/problem_3 → Supplier clustering
POST /train/problem_4 → Customer-country clustering

Each endpoint returns a success message and the generated CSV path.

2️⃣ Download Cluster Results

GET /download/problem_2
GET /download/problem_3
GET /download/problem_4

Each route provides a downloadable CSV file of the clustered output.

🔧 How It Works

Data Collection – Fetches tables from a PostgreSQL database using SQLAlchemy.
Preprocessing – Handles missing values, normalization, and feature scaling via StandardScaler.
Training – Runs DBSCAN with optimized parameters (epsilon and min_samples) determined via silhouette score and knee detection.
Visualization – Saves k-distance and cluster distribution plots under the outputs/ directory.
API Interaction – Use FastAPI to trigger clustering and download results.

🧪 Example Usage

Train a clustering model

curl -X POST http://127.0.0.1:8000/train/problem_2

Response:

{
  "message": "Problem 2 model trained successfully.",
  "file": "outputs/dbscan_clustered_products_problem_2.csv"
}

Download the result

curl -O http://127.0.0.1:8000/download/problem_2

📊 Visualization Samples

During training, DBSCAN automatically generates the following plots:

*_eps_plot.png → elbow curve for optimal epsilon
*_clusters_plot.png → visual cluster separation for the selected entity

All visualizations are saved under the outputs/ folder.

🧰 Setup & Run

1️⃣ Clone the repository

git clone https://github.com/yourusername/retail-segmentation-api.git
cd retail-segmentation-api

2️⃣ Install dependencies

pip install -r requirements.txt

3️⃣ Configure environment

Create a .env file with your database credentials:

DB_USER=your_username
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=5432
DB_NAME=northwind

4️⃣ Run the API

uvicorn app:app --reload

Access it at: 👉 http://127.0.0.1:8000/docs

🧩 Requirements

fastapi
uvicorn
pandas
numpy
scikit-learn
sqlalchemy
matplotlib
kneed
python-dotenv
joblib
psycopg2

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
__pycache__		__pycache__
models		models
outputs		outputs
README.md		README.md
app.py		app.py
country_clusters.csv		country_clusters.csv
database.py		database.py
db_connection.py		db_connection.py
dbscan_clustered_products.csv		dbscan_clustered_products.csv
main.py		main.py
preprocessing.py		preprocessing.py
supplier_clusters.csv		supplier_clusters.csv
training.py		training.py
visualization.py		visualization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Retail Segmentation API (DBSCAN)

🚀 Features

🧩 Tech Stack

⚙️ Project Structure

🧠 API Endpoints

1️⃣ Train Models

2️⃣ Download Cluster Results

🔧 How It Works

🧪 Example Usage

Train a clustering model

Download the result

📊 Visualization Samples

🧰 Setup & Run

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Configure environment

4️⃣ Run the API

🧩 Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Retail Segmentation API (DBSCAN)

🚀 Features

🧩 Tech Stack

⚙️ Project Structure

🧠 API Endpoints

1️⃣ Train Models

2️⃣ Download Cluster Results

🔧 How It Works

🧪 Example Usage

Train a clustering model

Download the result

📊 Visualization Samples

🧰 Setup & Run

1️⃣ Clone the repository

2️⃣ Install dependencies

3️⃣ Configure environment

4️⃣ Run the API

🧩 Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages