Check out the live demo
An Interactive Shiny App for Understanding How K-Means Clustering Works
This interactive Shiny application is designed for teaching and learning the K-means clustering algorithm. It visualizes every step of the algorithm—from initial random centroid placement through iterative assignments and updates until convergence—helping students develop an intuitive understanding of how K-means works.
Perfect for courses in:
- Machine Learning
- Data Mining
- Business Analytics
- Marketing Analytics (Customer Segmentation)
- Statistics
Students using this app will:
- Visualize the iterative process: See how cluster assignments and centroids change at each step
- Understand convergence: Observe when and why the algorithm stops iterating
- Explore parameter effects: Experiment with different values of k, random seeds, and initialization
- Handle edge cases: See how the algorithm handles empty clusters through reinitialization
- Apply to real data: Test K-means on custom datasets relevant to their domain
- Step-by-Step Iteration: Use the slider to move forward/backward through each algorithm iteration
- Visual Clarity:
- Data points colored by cluster assignment
- Large yellow triangular centroids with colored borders
- Optional centroid path traces to see movement over iterations
- Default Demo Data: Includes simulated customer segmentation data (income vs. spending) to get started immediately
- Custom Data Upload: Upload your own
.csv,.tsv,.txt, or.rdsfiles - Interactive Controls:
- Choose which variables to cluster on
- Select which two variables to visualize
- Adjust number of clusters (k)
- Set maximum iterations and random seed for reproducibility
- Summary Statistics: View centroid coordinates and cluster sizes at each iteration
-
Clone this repository:
git clone https://github.com/weim-mkt/kmeans-teaching-demo.git cd kmeans-teaching-demo -
Install required packages:
install.packages(c("shiny", "ggplot2", "dplyr", "purrr", "readr"))
-
Launch the app:
shiny::runApp()
-
Explore: The app loads with default customer segmentation data. Click "Run K-means" and use the slider to step through iterations!
For consistent package versions across different machines:
# Install renv if needed
install.packages("renv")
# Restore project library
renv::restore()
# Launch app
shiny::runApp()- Introduction (5 min): Explain K-means conceptually—assignment step, update step, convergence
- Live Demo (10-15 min):
- Show iteration 0 (random initialization)
- Step through first 2-3 iterations manually to illustrate the process
- Show convergence at the final iteration
- Demonstrate effect of different k values
- Show impact of different random seeds
- Hands-On Exercise (15-20 min): Have students run the app and explore:
- What happens with k=2 vs k=3 vs k=5?
- How many iterations until convergence?
- Does changing the seed matter?
- Explore convergence: "Run K-means with k=3 five times using different seeds. Report how many iterations it takes each time and explain why."
- Optimal k: "Using the demo data, determine the most appropriate number of clusters and justify your choice."
- Custom data: "Upload a dataset relevant to your project and identify natural segments."
This app is part of MSIN0094 Week 4: How K-Means Algorithm Works, demonstrating:
- Iterative optimization algorithms
- Distance-based clustering
- Centroid calculation
- Convergence criteria
- Practical customer segmentation applications
- Colored circles: Data points, colored by their current cluster assignment
- Yellow triangles: Cluster centroids with colored borders matching their cluster
- Gray points (iteration 0): Unassigned data before first assignment step
- Centroid paths: Optional traces showing how centroids move across iterations
- Iteration slider: Each position shows the state after an assignment or update step
The app includes simulated customer data with 200 observations:
- Segment 1: Lower income (~$35k), moderate spending
- Segment 2: Higher income (~$75k), higher spending
- Uses Euclidean distance for assignments
- Handles empty clusters by reinitializing to a random data point
- Convergence threshold: centroids move less than 1e-6
This app can be deployed to:
- shinyapps.io
- Posit Connect
- Any Shiny Server
Use the included renv.lock for consistent package versions across environments.
This is an educational resource! Contributions welcome:
- Bug fixes
- Feature enhancements
- Additional example datasets
- Documentation improvements
This project is open source and available for educational use.
Created for teaching machine learning and customer analytics.
Questions or suggestions? Open an issue or submit a pull request!