Skip to content

weim-mkt/kmeans-teaching-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

K-Means Algorithm Teaching Demo

Check out the live demo

An Interactive Shiny App for Understanding How K-Means Clustering Works

R Shiny

📚 Overview

This interactive Shiny application is designed for teaching and learning the K-means clustering algorithm. It visualizes every step of the algorithm—from initial random centroid placement through iterative assignments and updates until convergence—helping students develop an intuitive understanding of how K-means works.

Perfect for courses in:

  • Machine Learning
  • Data Mining
  • Business Analytics
  • Marketing Analytics (Customer Segmentation)
  • Statistics

🎯 Learning Objectives

Students using this app will:

  1. Visualize the iterative process: See how cluster assignments and centroids change at each step
  2. Understand convergence: Observe when and why the algorithm stops iterating
  3. Explore parameter effects: Experiment with different values of k, random seeds, and initialization
  4. Handle edge cases: See how the algorithm handles empty clusters through reinitialization
  5. Apply to real data: Test K-means on custom datasets relevant to their domain

✨ Key Features

  • Step-by-Step Iteration: Use the slider to move forward/backward through each algorithm iteration
  • Visual Clarity:
    • Data points colored by cluster assignment
    • Large yellow triangular centroids with colored borders
    • Optional centroid path traces to see movement over iterations
  • Default Demo Data: Includes simulated customer segmentation data (income vs. spending) to get started immediately
  • Custom Data Upload: Upload your own .csv, .tsv, .txt, or .rds files
  • Interactive Controls:
    • Choose which variables to cluster on
    • Select which two variables to visualize
    • Adjust number of clusters (k)
    • Set maximum iterations and random seed for reproducibility
  • Summary Statistics: View centroid coordinates and cluster sizes at each iteration

🚀 Quick Start

Option 1: Run Locally

  1. Clone this repository:

    git clone https://github.com/weim-mkt/kmeans-teaching-demo.git
    cd kmeans-teaching-demo
  2. Install required packages:

    install.packages(c("shiny", "ggplot2", "dplyr", "purrr", "readr"))
  3. Launch the app:

    shiny::runApp()
  4. Explore: The app loads with default customer segmentation data. Click "Run K-means" and use the slider to step through iterations!

Option 2: Use with renv (Reproducible Environment)

For consistent package versions across different machines:

# Install renv if needed
install.packages("renv")

# Restore project library
renv::restore()

# Launch app
shiny::runApp()

📖 How to Use for Teaching

In the Classroom

  1. Introduction (5 min): Explain K-means conceptually—assignment step, update step, convergence
  2. Live Demo (10-15 min):
    • Show iteration 0 (random initialization)
    • Step through first 2-3 iterations manually to illustrate the process
    • Show convergence at the final iteration
    • Demonstrate effect of different k values
    • Show impact of different random seeds
  3. Hands-On Exercise (15-20 min): Have students run the app and explore:
    • What happens with k=2 vs k=3 vs k=5?
    • How many iterations until convergence?
    • Does changing the seed matter?

Student Assignments

  • Explore convergence: "Run K-means with k=3 five times using different seeds. Report how many iterations it takes each time and explain why."
  • Optimal k: "Using the demo data, determine the most appropriate number of clusters and justify your choice."
  • Custom data: "Upload a dataset relevant to your project and identify natural segments."

🎓 Course Context: MSIN0094 Week 4

This app is part of MSIN0094 Week 4: How K-Means Algorithm Works, demonstrating:

  • Iterative optimization algorithms
  • Distance-based clustering
  • Centroid calculation
  • Convergence criteria
  • Practical customer segmentation applications

📊 Understanding the Visualization

  • Colored circles: Data points, colored by their current cluster assignment
  • Yellow triangles: Cluster centroids with colored borders matching their cluster
  • Gray points (iteration 0): Unassigned data before first assignment step
  • Centroid paths: Optional traces showing how centroids move across iterations
  • Iteration slider: Each position shows the state after an assignment or update step

🛠️ Technical Details

Default Dataset

The app includes simulated customer data with 200 observations:

  • Segment 1: Lower income (~$35k), moderate spending
  • Segment 2: Higher income (~$75k), higher spending

Algorithm Implementation

  • Uses Euclidean distance for assignments
  • Handles empty clusters by reinitializing to a random data point
  • Convergence threshold: centroids move less than 1e-6

Deployment

This app can be deployed to:

Use the included renv.lock for consistent package versions across environments.

🤝 Contributing

This is an educational resource! Contributions welcome:

  • Bug fixes
  • Feature enhancements
  • Additional example datasets
  • Documentation improvements

📄 License

This project is open source and available for educational use.

👤 Author

Created for teaching machine learning and customer analytics.


Questions or suggestions? Open an issue or submit a pull request!

About

This repo develops a shiny app to show students how k-means algorithm works and find the centroids

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages