Skip to content

Exploration of clustering algorithms (K-Means, DBSCAN) on normally distributed data. Includes EDA, visualization, and a discussion on why clustering may fail when no natural groups exist.

Notifications You must be signed in to change notification settings

aakritrajput/Clustering-on-Synthetic-or-Normal-Distribution-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Clustering on Normally Distributed Data

Objective

The objective of this project was to apply clustering algorithms (K-Means and DBSCAN) on a dataset that is normally distributed around a mean, and analyze the results.

Methodology

  • Performed Exploratory Data Analysis (EDA) with heatmaps, boxplots, and distribution plots.
  • Applied K-Means clustering and visualized distortion (elbow curve).
  • Applied DBSCAN clustering and tuned parameters (eps, min_samples).
  • Compared results from both algorithms.

Key Observations

  • The dataset resembled a single spherical distribution with no distinct natural clusters.
  • K-Means divided it into "pizza slice" shaped regions.
  • DBSCAN mostly found a single cluster with varying outliers depending on parameters.
  • Conclusion: The dataset is not inherently clusterable, and clustering is not meaningful here.

Visualizations

Heatmap

Heatmap

Boxplot

Boxplot

K-Means Elbow Curve

Elbow Curve for whole data

Cluster Visualization

Clusters through KMeans

Conclusion

This project highlights that not all datasets are suitable for clustering.
Recognizing when clustering fails is just as important as when it succeeds.

Tech Stack

  • Python, NumPy, Pandas, Matplotlib, Seaborn, Scikit-learn
  • Google Colab

About

Exploration of clustering algorithms (K-Means, DBSCAN) on normally distributed data. Includes EDA, visualization, and a discussion on why clustering may fail when no natural groups exist.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published