HPC-Parallel-K-Means

A parallelized K-Means clustering algorithm implemented during my High Performance Computing coursework at Indiana University, showcasing both sequential and parallel designs for enhanced performance on large datasets.

This project was executed on Indiana University’s Big Red 200 supercomputer, scaling experiments on up to 120 compute nodes.

About the Project

Course: High Performance Computing (HPC)
Institution: Indiana University (Graduate-level)
Purpose: Compare the performance of sequential and parallel implementations of K-Means clustering using C++, analyzing speedup and scalability.
Infrastructure: IU Big Red 200 supercomputer (Cray Shasta), up to 120 nodes.

Implementations

Implementation Type	File	Parallel Approach
Sequential Baseline	`sequential-k-means.cpp`	Single-threaded C++
Parallel + Multithreading	`parallel-k-means.cpp`	OpenMP / Pthreads (depending on setup)
Data Generation	`generateData.cpp`	Creates synthetic datasets (e.g., `10000.txt`, `100000.txt`)

How to Run

Compile:

g++ sequential-k-means.cpp -o seq_kmeans
g++ parallel-k-means.cpp -fopenmp -o par_kmeans  # if using OpenMP

Generate Data:

g++ generateData.cpp -o gen_data
./gen_data 1000000  # generates a 1 million-point dataset

Execute Locally:

./seq_kmeans data.txt
./par_kmeans data.txt

Execute on Big Red 200 (SLURM example):
```
sbatch -N 60 -n 120 run_kmeans.sh
```

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
10000.txt		10000.txt
100000.txt		100000.txt
1000000.txt		1000000.txt
50000.txt		50000.txt
500000.txt		500000.txt
README.md		README.md
Report.pdf		Report.pdf
generateData.cpp		generateData.cpp
parallel-k-means.cpp		parallel-k-means.cpp
sequential-k-means.cpp		sequential-k-means.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HPC-Parallel-K-Means

About the Project

Implementations

How to Run

About

Uh oh!

Releases

Packages

Languages

git-avinashpawar/HPC-Parallel-K-means

Folders and files

Latest commit

History

Repository files navigation

HPC-Parallel-K-Means

About the Project

Implementations

How to Run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages