Skip to content

This repository contains data analysis and customer segmentation of Kenyan banks. It aims to understand customer behaviors and patterns. Explore how Kenyan banks segment their customers using demographics, behavior, and needs. Gain data-driven strategies to target the right audiences with the most relevant products and services

License

Notifications You must be signed in to change notification settings

James-Muguro/CustomerSegmentation

Repository files navigation

Advanced Bank Customer Segmentation Analysis

Overview

This project implements a sophisticated customer segmentation analysis system for banking data using modern machine learning techniques and best practices. It provides actionable insights for targeted marketing, product development, and customer relationship management through advanced data analytics and interactive visualizations.

Key Features

  • Robust Data Validation & Preprocessing

    • Comprehensive data quality checks
    • Automated data validation pipelines
    • Sophisticated outlier detection and handling
    • Advanced feature engineering
  • Advanced Analytics

    • Multiple clustering algorithms (K-means, DBSCAN, GMM)
    • Automated optimal cluster selection
    • Silhouette analysis for cluster validation
    • Interactive visualizations using Plotly and Matplotlib
  • Feature Engineering

    • Automated date-based feature extraction
    • Financial ratio calculations
    • Advanced customer metrics
    • Automated outlier handling
  • Interactive Visualizations

    • Dynamic cluster analysis plots
    • Interactive segment comparison tools
    • Financial metrics dashboards
    • Customer distribution analysis
  • Business Intelligence

    • Automated segment profiling
    • Actionable business recommendations
    • Customer behavior analysis
    • Detailed segment characteristics

Generated Features & Insights

Financial Metrics

  • Credit utilization ratio
  • Debt to income ratio
  • Total financial assets
  • Savings ratio
  • Customer value score

Temporal Features

  • Customer lifecycle metrics
  • Relationship duration analysis
  • Transaction patterns
  • Seasonal behaviors

Behavioral Segments

  • Banking relationship patterns
  • Product usage profiles
  • Risk profiles
  • Investment behaviors

Data Distributions

  • Customer distribution by income bands
  • Geographic distribution analysis
  • Banking relationship patterns
  • Gender-based customer profiles
  • Relationship duration analysis
  • Financial behavior patterns

Technical Stack

  • Core Dependencies

    • Python 3.8+
    • pandas >= 2.1.0
    • numpy >= 1.24.0
    • scikit-learn >= 1.3.0
    • plotly >= 5.18.0
    • matplotlib >= 3.8.0
    • seaborn >= 0.13.0
  • Additional Libraries

    • ipython >= 8.0.0
    • numpy-financial >= 1.0.0
    • openpyxl >= 3.1.0
    • joblib >= 1.3.0

Project Structure

CustomerSegmentation/
├── bank_customer_segmentation.ipynb    # Main analysis notebook
├── requirements.txt                    # Project dependencies
├── models/                            # Saved ML models
│   ├── customer_segmentation_kmeans.joblib
│   └── feature_scaler.joblib
├── reports/                           # Generated reports
│   └── segment_analysis.json
├── data/                             # Input data
│   ├── customer_data.xlsx
│   └── location.data.xlsx
└── README.md                         # Project documentation

Installation & Setup

  1. Clone the repository:
git clone https://github.com/James-Muguro/CustomerSegmentation.git
cd CustomerSegmentation
  1. Create and activate virtual environment:
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Interactive App

An interactive Streamlit app is included at streamlit_app.py to explore clustering results and interactive charts from the notebook.

How to run locally:

# (optional) create and activate a virtual environment if not already active
python -m venv .venv
source .venv/bin/activate

# install dependencies
pip install -r requirements.txt

# run the Streamlit app
streamlit run streamlit_app.py

Notes:

  • The app will attempt to load data from data/customer_data.xlsx. If that file is not present it will generate a small synthetic sample dataset so you can explore the UI.
  • Use the sidebar to select features and the number of clusters, then click "Run clustering" to compute and display interactive charts.

Model Details

The project implements three clustering approaches:

  1. K-means Clustering

    • Optimal cluster selection via silhouette analysis
    • Robust feature scaling
    • Automated model persistence
  2. Gaussian Mixture Models

    • Probabilistic clustering
    • Flexible cluster shapes
    • Component analysis
  3. DBSCAN

    • Density-based clustering
    • Automatic noise detection
    • Non-parametric approach

Results & Insights

  • Detailed customer segment profiles
  • Interactive visualization dashboards
  • Actionable business recommendations
  • Automated reporting system
  • Model persistence for production use

How to Contribute

We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:

  1. Fork the repository
  2. Create a new branch (git checkout -b feature/improvement)
  3. Make your changes
  4. Commit your changes (git commit -am 'Add new feature')
  5. Push to the branch (git push origin feature/improvement)
  6. Create a new Pull Request

Licensing

This project is licensed under the MIT License.

Credits and Acknowledgements

We extend our deepest gratitude to all who have made this project possible:

  • OpenAI ChatGPT model and Microsoft Copilot for their instrumental roles in debugging and refining the project's code and documentation
  • Our peers for their insightful feedback and constructive critiques throughout the project's development
  • The Python community worldwide for their intellectual and technical contributions that made this project possible
  • Open source libraries and tools used in this project

Contributing

We highly encourage contributions to this project! If you have ideas for improvements, new features, or bug fixes, please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/improvement).
  3. Make your changes.
  4. Commit your changes (git commit -am 'Add new feature').
  5. Push to the branch (git push origin feature/improvement).
  6. Create a new Pull Request.

License

This project is licensed under the MIT License.

Acknowledgements

We extend our deepest gratitude to all who have made this project possible. Our special thanks go to the OpenAI ChatGPT model and Microsoft Copilot, whose instrumental roles in debugging and refining the project’s code and documentation significantly contributed to the development of our project.

Our peers deserve our sincere appreciation for their insightful feedback and constructive critiques throughout the project's development. Their unique perspectives and experiences have been instrumental in steering our project towards its successful completion.

Lastly, we acknowledge that this project would not have been achievable without the intellectual and technical contributions of the Python community worldwide. Their groundbreaking work has opened up new possibilities, and for that, we are profoundly grateful.

About

This repository contains data analysis and customer segmentation of Kenyan banks. It aims to understand customer behaviors and patterns. Explore how Kenyan banks segment their customers using demographics, behavior, and needs. Gain data-driven strategies to target the right audiences with the most relevant products and services

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published