In this project, I conducted exploration of industry best practices in clusterization, employing RFM Analysis for the purpose of Marketing Segmentation
Used different algorithms:
- Hierarchical clustering
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
- K-means
Determined the number of clusters using:
- Gap Statistics
- Silhouette score
- Elbow rule
Dataset: https://www.kaggle.com/datasets/yasserh/customer-segmentation-dataset/data
Links:
- Trevor Hastie, Robert Tibshirani and Guenther Walther, Estimating the number of clusters in a data set via the gap statistics (2000) https://hastie.su.domains/Papers/gap.pdf
- https://clevertap.com/blog/rfm-analysis/
- https://stats.stackexchange.com/questions/398635/relation-between-pairwise-distance-sum-and-sum-of-distance-to-mean-gap-statisti