This project is an end-to-end data analysis of an e-commerce dataset to segment customers using an RFM model. The goal is to identify high-value and at-risk customer groups and provide actionable marketing strategies, with the final output being a dynamic, interactive BI dashboard.
A mid-sized global e-commerce retailer was experiencing a noticeable increase in customer churn. The company lacked a data-driven method to understand the purchasing behavior of different customer groups, making it difficult to target marketing efforts effectively and prevent valuable customers from leaving. This project aims to solve that problem by creating a robust customer segmentation model.
- Cloud Data Warehouse: Google BigQuery
- Language: SQL
- BI & Visualization: Google Looker Studio
- Collaboration: Google Slides, Google Docs
The dataset used for this analysis is the "Online Retail II" dataset, publicly available from the UCI Machine Learning Repository. You can find the dataset on Kaggle.
The project followed the complete data analytics lifecycle:
-
Data Cleaning & Transformation (ETL): Ingested over 540,000 raw transaction records into Google BigQuery. Used SQL to develop a cleaning pipeline to handle over 140,000 records with data integrity issues (
NULL
s, returns, etc.), resulting in a clean analytical dataset. -
RFM Model Engineering: Used advanced SQL to engineer a Recency, Frequency, and Monetary (RFM) model from scratch, calculating each customer's last purchase date, transaction frequency, and total monetary value.
-
Customer Scoring & Segmentation: Leveraged SQL window functions (
NTILE
) to score each customer on a 1-4 scale for each RFM dimension. Implemented aCASE
statement to group all 4,338 customers into nine distinct, business-relevant segments. -
Dashboard Development & Validation: Built an interactive, dynamic dashboard in Looker Studio. During development, I identified and corrected a critical data validation error, ensuring the final dashboard was 100% accurate.
The final analysis provided several critical insights:
- High-Value Segment Identified: The 'Champions' segment, while comprising only 11% of the customer base (489 customers), generates nearly 50% of the company's total revenue ($4.45M).
- Actionable Insights Delivered: The marketing team is now empowered with a tool to visualize and filter customer segments in real-time for targeted campaigns.
- Hibernating Customers: The largest group by count is 'Hibernating' customers (1,521 total), representing a significant opportunity for re-engagement.
A dynamic dashboard was developed in Google Looker Studio to serve as a strategic tool for the marketing team. It translates the complex RFM model into an intuitive visual interface, allowing stakeholders to instantly identify and analyze the value of all nine customer segments. The dashboard is equipped with an interactive filter to enable data-driven decisions for targeted retention and loyalty campaigns.