Advanced Bank Customer Segmentation Analysis

Overview

This project implements a sophisticated customer segmentation analysis system for banking data using modern machine learning techniques and best practices. It provides actionable insights for targeted marketing, product development, and customer relationship management through advanced data analytics and interactive visualizations.

Key Features

Robust Data Validation & Preprocessing
- Comprehensive data quality checks
- Automated data validation pipelines
- Sophisticated outlier detection and handling
- Advanced feature engineering
Advanced Analytics
- Multiple clustering algorithms (K-means, DBSCAN, GMM)
- Automated optimal cluster selection
- Silhouette analysis for cluster validation
- Interactive visualizations using Plotly and Matplotlib
Feature Engineering
- Automated date-based feature extraction
- Financial ratio calculations
- Advanced customer metrics
- Automated outlier handling
Interactive Visualizations
- Dynamic cluster analysis plots
- Interactive segment comparison tools
- Financial metrics dashboards
- Customer distribution analysis
Business Intelligence
- Automated segment profiling
- Actionable business recommendations
- Customer behavior analysis
- Detailed segment characteristics

Generated Features & Insights

Financial Metrics

Credit utilization ratio
Debt to income ratio
Total financial assets
Savings ratio
Customer value score

Temporal Features

Customer lifecycle metrics
Relationship duration analysis
Transaction patterns
Seasonal behaviors

Behavioral Segments

Banking relationship patterns
Product usage profiles
Risk profiles
Investment behaviors

Data Distributions

Customer distribution by income bands
Geographic distribution analysis
Banking relationship patterns
Gender-based customer profiles
Relationship duration analysis
Financial behavior patterns

Technical Stack

Core Dependencies
- Python 3.8+
- pandas >= 2.1.0
- numpy >= 1.24.0
- scikit-learn >= 1.3.0
- plotly >= 5.18.0
- matplotlib >= 3.8.0
- seaborn >= 0.13.0
Additional Libraries
- ipython >= 8.0.0
- numpy-financial >= 1.0.0
- openpyxl >= 3.1.0
- joblib >= 1.3.0

Project Structure

CustomerSegmentation/
├── bank_customer_segmentation.ipynb    # Main analysis notebook
├── requirements.txt                    # Project dependencies
├── models/                            # Saved ML models
│   ├── customer_segmentation_kmeans.joblib
│   └── feature_scaler.joblib
├── reports/                           # Generated reports
│   └── segment_analysis.json
├── data/                             # Input data
│   ├── customer_data.xlsx
│   └── location.data.xlsx
└── README.md                         # Project documentation

Installation & Setup

Clone the repository:

git clone https://github.com/James-Muguro/CustomerSegmentation.git
cd CustomerSegmentation

Create and activate virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Interactive App

An interactive Streamlit app is included at streamlit_app.py to explore clustering results and interactive charts from the notebook.

How to run locally:

# (optional) create and activate a virtual environment if not already active
python -m venv .venv
source .venv/bin/activate

# install dependencies
pip install -r requirements.txt

# run the Streamlit app
streamlit run streamlit_app.py

Notes:

The app will attempt to load data from data/customer_data.xlsx. If that file is not present it will generate a small synthetic sample dataset so you can explore the UI.
Use the sidebar to select features and the number of clusters, then click "Run clustering" to compute and display interactive charts.

Model Details

The project implements three clustering approaches:

K-means Clustering
- Optimal cluster selection via silhouette analysis
- Robust feature scaling
- Automated model persistence
Gaussian Mixture Models
- Probabilistic clustering
- Flexible cluster shapes
- Component analysis
DBSCAN
- Density-based clustering
- Automatic noise detection
- Non-parametric approach

Results & Insights

Detailed customer segment profiles
Interactive visualization dashboards
Actionable business recommendations
Automated reporting system
Model persistence for production use

How to Contribute

We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:

Fork the repository
Create a new branch (git checkout -b feature/improvement)
Make your changes
Commit your changes (git commit -am 'Add new feature')
Push to the branch (git push origin feature/improvement)
Create a new Pull Request

Licensing

This project is licensed under the MIT License.

Credits and Acknowledgements

We extend our deepest gratitude to all who have made this project possible:

OpenAI ChatGPT model and Microsoft Copilot for their instrumental roles in debugging and refining the project's code and documentation
Our peers for their insightful feedback and constructive critiques throughout the project's development
The Python community worldwide for their intellectual and technical contributions that made this project possible
Open source libraries and tools used in this project

Contributing

We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/improvement).
Make your changes.
Commit your changes (git commit -am 'Add new feature').
Push to the branch (git push origin feature/improvement).
Create a new Pull Request.

License

This project is licensed under the MIT License.

Acknowledgements

We extend our deepest gratitude to all who have made this project possible. Our special thanks go to the OpenAI ChatGPT model and Microsoft Copilot, whose instrumental roles in debugging and refining the project’s code and documentation significantly contributed to the development of our project.

Our peers deserve our sincere appreciation for their insightful feedback and constructive critiques throughout the project's development. Their unique perspectives and experiences have been instrumental in steering our project towards its successful completion.

Lastly, we acknowledge that this project would not have been achievable without the intellectual and technical contributions of the Python community worldwide. Their groundbreaking work has opened up new possibilities, and for that, we are profoundly grateful.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bank_customer_segmentation.ipynb		bank_customer_segmentation.ipynb
customer_data.xlsx		customer_data.xlsx
customer_segmentation_banks.pptx		customer_segmentation_banks.pptx
location.xlsx		location.xlsx
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced Bank Customer Segmentation Analysis

Overview

Key Features

Generated Features & Insights

Financial Metrics

Temporal Features

Behavioral Segments

Data Distributions

Technical Stack

Project Structure

Installation & Setup

Interactive App

Model Details

Results & Insights

How to Contribute

Licensing

Credits and Acknowledgements

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

James-Muguro/CustomerSegmentation

Folders and files

Latest commit

History

Repository files navigation

Advanced Bank Customer Segmentation Analysis

Overview

Key Features

Generated Features & Insights

Financial Metrics

Temporal Features

Behavioral Segments

Data Distributions

Technical Stack

Project Structure

Installation & Setup

Interactive App

Model Details

Results & Insights

How to Contribute

Licensing

Credits and Acknowledgements

Contributing

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages