K-Means Algorithm Teaching Demo

Check out the live demo

An Interactive Shiny App for Understanding How K-Means Clustering Works

📚 Overview

This interactive Shiny application is designed for teaching and learning the K-means clustering algorithm. It visualizes every step of the algorithm—from initial random centroid placement through iterative assignments and updates until convergence—helping students develop an intuitive understanding of how K-means works.

Perfect for courses in:

Machine Learning
Data Mining
Business Analytics
Marketing Analytics (Customer Segmentation)
Statistics

🎯 Learning Objectives

Students using this app will:

Visualize the iterative process: See how cluster assignments and centroids change at each step
Understand convergence: Observe when and why the algorithm stops iterating
Explore parameter effects: Experiment with different values of k, random seeds, and initialization
Handle edge cases: See how the algorithm handles empty clusters through reinitialization
Apply to real data: Test K-means on custom datasets relevant to their domain

✨ Key Features

Step-by-Step Iteration: Use the slider to move forward/backward through each algorithm iteration
Visual Clarity:
- Data points colored by cluster assignment
- Large yellow triangular centroids with colored borders
- Optional centroid path traces to see movement over iterations
Default Demo Data: Includes simulated customer segmentation data (income vs. spending) to get started immediately
Custom Data Upload: Upload your own .csv, .tsv, .txt, or .rds files
Interactive Controls:
- Choose which variables to cluster on
- Select which two variables to visualize
- Adjust number of clusters (k)
- Set maximum iterations and random seed for reproducibility
Summary Statistics: View centroid coordinates and cluster sizes at each iteration

🚀 Quick Start

Option 1: Run Locally

Clone this repository:

git clone https://github.com/weim-mkt/kmeans-teaching-demo.git
cd kmeans-teaching-demo

Install required packages:

install.packages(c("shiny", "ggplot2", "dplyr", "purrr", "readr"))

Launch the app:
```
shiny::runApp()
```
Explore: The app loads with default customer segmentation data. Click "Run K-means" and use the slider to step through iterations!

Option 2: Use with `renv` (Reproducible Environment)

For consistent package versions across different machines:

# Install renv if needed
install.packages("renv")

# Restore project library
renv::restore()

# Launch app
shiny::runApp()

📖 How to Use for Teaching

In the Classroom

Introduction (5 min): Explain K-means conceptually—assignment step, update step, convergence
Live Demo (10-15 min):
- Show iteration 0 (random initialization)
- Step through first 2-3 iterations manually to illustrate the process
- Show convergence at the final iteration
- Demonstrate effect of different k values
- Show impact of different random seeds
Hands-On Exercise (15-20 min): Have students run the app and explore:
- What happens with k=2 vs k=3 vs k=5?
- How many iterations until convergence?
- Does changing the seed matter?

Student Assignments

Explore convergence: "Run K-means with k=3 five times using different seeds. Report how many iterations it takes each time and explain why."
Optimal k: "Using the demo data, determine the most appropriate number of clusters and justify your choice."
Custom data: "Upload a dataset relevant to your project and identify natural segments."

🎓 Course Context: MSIN0094 Week 4

This app is part of MSIN0094 Week 4: How K-Means Algorithm Works, demonstrating:

Iterative optimization algorithms
Distance-based clustering
Centroid calculation
Convergence criteria
Practical customer segmentation applications

📊 Understanding the Visualization

Colored circles: Data points, colored by their current cluster assignment
Yellow triangles: Cluster centroids with colored borders matching their cluster
Gray points (iteration 0): Unassigned data before first assignment step
Centroid paths: Optional traces showing how centroids move across iterations
Iteration slider: Each position shows the state after an assignment or update step

🛠️ Technical Details

Default Dataset

The app includes simulated customer data with 200 observations:

Segment 1: Lower income (~$35k), moderate spending
Segment 2: Higher income (~$75k), higher spending

Algorithm Implementation

Uses Euclidean distance for assignments
Handles empty clusters by reinitializing to a random data point
Convergence threshold: centroids move less than 1e-6

Deployment

This app can be deployed to:

shinyapps.io
Posit Connect
Any Shiny Server

Use the included renv.lock for consistent package versions across environments.

🤝 Contributing

This is an educational resource! Contributions welcome:

Bug fixes
Feature enhancements
Additional example datasets
Documentation improvements

📄 License

This project is open source and available for educational use.

👤 Author

Created for teaching machine learning and customer analytics.

Questions or suggestions? Open an issue or submit a pull request!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.posit/publish		.posit/publish
renv		renv
.Rprofile		.Rprofile
README.md		README.md
app.R		app.R
renv.lock		renv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-Means Algorithm Teaching Demo

📚 Overview

🎯 Learning Objectives

✨ Key Features

🚀 Quick Start

Option 1: Run Locally

Option 2: Use with `renv` (Reproducible Environment)

📖 How to Use for Teaching

In the Classroom

Student Assignments

🎓 Course Context: MSIN0094 Week 4

📊 Understanding the Visualization

🛠️ Technical Details

Default Dataset

Algorithm Implementation

Deployment

🤝 Contributing

📄 License

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K-Means Algorithm Teaching Demo

📚 Overview

🎯 Learning Objectives

✨ Key Features

🚀 Quick Start

Option 1: Run Locally

Option 2: Use with renv (Reproducible Environment)

📖 How to Use for Teaching

In the Classroom

Student Assignments

🎓 Course Context: MSIN0094 Week 4

📊 Understanding the Visualization

🛠️ Technical Details

Default Dataset

Algorithm Implementation

Deployment

🤝 Contributing

📄 License

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Option 2: Use with `renv` (Reproducible Environment)

Packages