In this project, I used the K-means Clustering machine learning algorithm to divide a retail store's customers into different groups (segments) based on their purchasing behavior. This helps businesses better understand their customers and create tailored strategies for them.
My main goal was to identify the natural customer groups within the Mall_Customers.csv dataset, allowing the store to optimize its marketing and sales efforts.
For this project, I used the Mall_Customers.csv dataset. This dataset contains information such as customers' Annual Income (k$) and Spending Score (1-100).
- Python: The programming language used.
- Pandas: For loading and managing data.
- Scikit-learn: For the K-means clustering algorithm and data scaling.
- Matplotlib: For creating plots and graphs.
- Seaborn: For making attractive and informative data visualizations.
- Data Loading: The
Mall_Customers.csvfile was loaded into a Pandas DataFrame. - Data Preprocessing & Feature Selection:
- The
Annual Income (k$)andSpending Score (1-100)columns were selected for clustering. - These features were standardized using
StandardScalerto bring all features to a similar scale (mean ~ 0, standard deviation ~ 1). This is crucial for distance-based algorithms like K-means.
- The
- Optimal K Value Finding (Elbow Method):
- The Elbow Method was employed to find the best number of clusters (
K) for the K-means algorithm. - WCSS (Within-Cluster Sum of Squares) was calculated for different
Kvalues (from 1 to 10), and a graph was plotted. The "elbow" point on the graph indicates the optimalKvalue (in this case,K=5was chosen).
- The Elbow Method was employed to find the best number of clusters (
- Applying K-means Clustering:
- The
KMeansalgorithm was applied to the scaled data using the identifiedoptimal_k(5 clusters). - Each customer was assigned their respective cluster label (from
0to4). - The centroids (average income and spending score) for each cluster were also calculated, representing the "center" of each group.
- The
- Visualization of Clusters:
- A scatter plot was created with
Annual Incomeon the X-axis andSpending Scoreon the Y-axis. - Each customer is represented by a dot, and different colors are used to denote their assigned cluster.
- The centroids of each cluster are marked with large red 'X's, clearly showing the average behavior of each group.
- A scatter plot was created with
The graph provides a visual summary of your customer base:
- X-axis (
Annual Income (k$)): Shows the customer's annual income. - Y-axis (
Spending Score (1-100)): Indicates how much a customer spends at the store (higher score = more spending). - Colored Dots: Each dot is a customer. Dots of the same color belong to the same customer group.
- Red 'X' (Centroids): This is the average point for each group. It helps you understand the typical characteristics (average income and spending) of customers within that group.
By looking at the graph, you can easily identify different customer segments, such as:
- High Income, High Spending Customers
- Low Income, High Spending Customers
- High Income, Low Spending Customers
- Low Income, Low Spending Customers
- And customers with average income/spending.
Customer segmentation offers significant advantages for businesses:
- Targeted Marketing: Businesses can create specific marketing campaigns for each customer group. For example, exclusive offers for high-value customers, or budget-friendly discounts for price-sensitive shoppers.
- Personalized Product Recommendations: Customers can receive product suggestions tailored to their preferences, leading to increased sales.
- Improved Customer Service: Understanding the needs of each segment allows for better and more relevant customer support.
- Resource Optimization: Focusing marketing budgets on the right segments can lead to a better ROI (Return on Investment).
- New Product Development: Insights into segment needs can guide the development of new products or services.
- Clone this repository:
git clone [Your_GitHub_Repo_URL_Here] cd [Your_Repo_Name] - Install the required libraries:
pip install pandas scikit-learn matplotlib seaborn
- Download the dataset:
- Make sure the
Mall_Customers.csvfile is in the same directory as your Python script.
- Make sure the
- Run the Python script:
(Replace
python your_script_name.py
your_script_name.pywith the actual name of your Python file, e.g.,kmeans_segmentation.py)
This will execute the clustering algorithm and display the Elbow Method graph followed by the final customer segmentation plot.