This project implements a sophisticated customer segmentation analysis system for banking data using modern machine learning techniques and best practices. It provides actionable insights for targeted marketing, product development, and customer relationship management through advanced data analytics and interactive visualizations.
-
Robust Data Validation & Preprocessing
- Comprehensive data quality checks
- Automated data validation pipelines
- Sophisticated outlier detection and handling
- Advanced feature engineering
-
Advanced Analytics
- Multiple clustering algorithms (K-means, DBSCAN, GMM)
- Automated optimal cluster selection
- Silhouette analysis for cluster validation
- Interactive visualizations using Plotly and Matplotlib
-
Feature Engineering
- Automated date-based feature extraction
- Financial ratio calculations
- Advanced customer metrics
- Automated outlier handling
-
Interactive Visualizations
- Dynamic cluster analysis plots
- Interactive segment comparison tools
- Financial metrics dashboards
- Customer distribution analysis
-
Business Intelligence
- Automated segment profiling
- Actionable business recommendations
- Customer behavior analysis
- Detailed segment characteristics
- Credit utilization ratio
- Debt to income ratio
- Total financial assets
- Savings ratio
- Customer value score
- Customer lifecycle metrics
- Relationship duration analysis
- Transaction patterns
- Seasonal behaviors
- Banking relationship patterns
- Product usage profiles
- Risk profiles
- Investment behaviors
- Customer distribution by income bands
- Geographic distribution analysis
- Banking relationship patterns
- Gender-based customer profiles
- Relationship duration analysis
- Financial behavior patterns
-
Core Dependencies
- Python 3.8+
- pandas >= 2.1.0
- numpy >= 1.24.0
- scikit-learn >= 1.3.0
- plotly >= 5.18.0
- matplotlib >= 3.8.0
- seaborn >= 0.13.0
-
Additional Libraries
- ipython >= 8.0.0
- numpy-financial >= 1.0.0
- openpyxl >= 3.1.0
- joblib >= 1.3.0
CustomerSegmentation/
├── bank_customer_segmentation.ipynb # Main analysis notebook
├── requirements.txt # Project dependencies
├── models/ # Saved ML models
│ ├── customer_segmentation_kmeans.joblib
│ └── feature_scaler.joblib
├── reports/ # Generated reports
│ └── segment_analysis.json
├── data/ # Input data
│ ├── customer_data.xlsx
│ └── location.data.xlsx
└── README.md # Project documentation
- Clone the repository:
git clone https://github.com/James-Muguro/CustomerSegmentation.git
cd CustomerSegmentation- Create and activate virtual environment:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtAn interactive Streamlit app is included at streamlit_app.py to explore clustering results and interactive charts from the notebook.
How to run locally:
# (optional) create and activate a virtual environment if not already active
python -m venv .venv
source .venv/bin/activate
# install dependencies
pip install -r requirements.txt
# run the Streamlit app
streamlit run streamlit_app.pyNotes:
- The app will attempt to load data from
data/customer_data.xlsx. If that file is not present it will generate a small synthetic sample dataset so you can explore the UI. - Use the sidebar to select features and the number of clusters, then click "Run clustering" to compute and display interactive charts.
The project implements three clustering approaches:
-
K-means Clustering
- Optimal cluster selection via silhouette analysis
- Robust feature scaling
- Automated model persistence
-
Gaussian Mixture Models
- Probabilistic clustering
- Flexible cluster shapes
- Component analysis
-
DBSCAN
- Density-based clustering
- Automatic noise detection
- Non-parametric approach
- Detailed customer segment profiles
- Interactive visualization dashboards
- Actionable business recommendations
- Automated reporting system
- Model persistence for production use
We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature/improvement) - Make your changes
- Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/improvement) - Create a new Pull Request
This project is licensed under the MIT License.
We extend our deepest gratitude to all who have made this project possible:
- OpenAI ChatGPT model and Microsoft Copilot for their instrumental roles in debugging and refining the project's code and documentation
- Our peers for their insightful feedback and constructive critiques throughout the project's development
- The Python community worldwide for their intellectual and technical contributions that made this project possible
- Open source libraries and tools used in this project
We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/improvement). - Make your changes.
- Commit your changes (
git commit -am 'Add new feature'). - Push to the branch (
git push origin feature/improvement). - Create a new Pull Request.
This project is licensed under the MIT License.
We extend our deepest gratitude to all who have made this project possible. Our special thanks go to the OpenAI ChatGPT model and Microsoft Copilot, whose instrumental roles in debugging and refining the project’s code and documentation significantly contributed to the development of our project.
Our peers deserve our sincere appreciation for their insightful feedback and constructive critiques throughout the project's development. Their unique perspectives and experiences have been instrumental in steering our project towards its successful completion.
Lastly, we acknowledge that this project would not have been achievable without the intellectual and technical contributions of the Python community worldwide. Their groundbreaking work has opened up new possibilities, and for that, we are profoundly grateful.