Customer Segmentation using K-means Clustering

Project Overview

In this project, I used the K-means Clustering machine learning algorithm to divide a retail store's customers into different groups (segments) based on their purchasing behavior. This helps businesses better understand their customers and create tailored strategies for them.

Goal

My main goal was to identify the natural customer groups within the Mall_Customers.csv dataset, allowing the store to optimize its marketing and sales efforts.

Dataset Used

For this project, I used the Mall_Customers.csv dataset. This dataset contains information such as customers' Annual Income (k$) and Spending Score (1-100).

Tools and Libraries Used

Python: The programming language used.
Pandas: For loading and managing data.
Scikit-learn: For the K-means clustering algorithm and data scaling.
Matplotlib: For creating plots and graphs.
Seaborn: For making attractive and informative data visualizations.

Key Steps Performed

Data Loading: The Mall_Customers.csv file was loaded into a Pandas DataFrame.
Data Preprocessing & Feature Selection:
- The Annual Income (k$) and Spending Score (1-100) columns were selected for clustering.
- These features were standardized using StandardScaler to bring all features to a similar scale (mean ~ 0, standard deviation ~ 1). This is crucial for distance-based algorithms like K-means.
Optimal K Value Finding (Elbow Method):
- The Elbow Method was employed to find the best number of clusters (K) for the K-means algorithm.
- WCSS (Within-Cluster Sum of Squares) was calculated for different K values (from 1 to 10), and a graph was plotted. The "elbow" point on the graph indicates the optimal K value (in this case, K=5 was chosen).
Applying K-means Clustering:
- The KMeans algorithm was applied to the scaled data using the identified optimal_k (5 clusters).
- Each customer was assigned their respective cluster label (from 0 to 4).
- The centroids (average income and spending score) for each cluster were also calculated, representing the "center" of each group.
Visualization of Clusters:
- A scatter plot was created with Annual Income on the X-axis and Spending Score on the Y-axis.
- Each customer is represented by a dot, and different colors are used to denote their assigned cluster.
- The centroids of each cluster are marked with large red 'X's, clearly showing the average behavior of each group.

Understanding the Output Graph

The graph provides a visual summary of your customer base:

X-axis (Annual Income (k$)): Shows the customer's annual income.
Y-axis (Spending Score (1-100)): Indicates how much a customer spends at the store (higher score = more spending).
Colored Dots: Each dot is a customer. Dots of the same color belong to the same customer group.
Red 'X' (Centroids): This is the average point for each group. It helps you understand the typical characteristics (average income and spending) of customers within that group.

By looking at the graph, you can easily identify different customer segments, such as:

High Income, High Spending Customers
Low Income, High Spending Customers
High Income, Low Spending Customers
Low Income, Low Spending Customers
And customers with average income/spending.

Real-World Applications & Benefits

Customer segmentation offers significant advantages for businesses:

Targeted Marketing: Businesses can create specific marketing campaigns for each customer group. For example, exclusive offers for high-value customers, or budget-friendly discounts for price-sensitive shoppers.
Personalized Product Recommendations: Customers can receive product suggestions tailored to their preferences, leading to increased sales.
Improved Customer Service: Understanding the needs of each segment allows for better and more relevant customer support.
Resource Optimization: Focusing marketing budgets on the right segments can lead to a better ROI (Return on Investment).
New Product Development: Insights into segment needs can guide the development of new products or services.

How to Run the Code

Clone this repository:

git clone [Your_GitHub_Repo_URL_Here]
cd [Your_Repo_Name]

Install the required libraries:

pip install pandas scikit-learn matplotlib seaborn

Download the dataset:
- Make sure the Mall_Customers.csv file is in the same directory as your Python script.
Run the Python script:
```
python your_script_name.py
```
(Replace your_script_name.py with the actual name of your Python file, e.g., kmeans_segmentation.py)

This will execute the clustering algorithm and display the Elbow Method graph followed by the final customer segmentation plot.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Customer_Segmentation.ipynb		Customer_Segmentation.ipynb
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Customer Segmentation using K-means Clustering

Project Overview

Goal

Dataset Used

Tools and Libraries Used

Key Steps Performed

Understanding the Output Graph

Real-World Applications & Benefits

How to Run the Code

About

Uh oh!

Releases

Packages

Languages

ravicoder01/Customer_Segmentation

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation using K-means Clustering

Project Overview

Goal

Dataset Used

Tools and Libraries Used

Key Steps Performed

Understanding the Output Graph

Real-World Applications & Benefits

How to Run the Code

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages