Parallel Programming, K-Means

This project aims to implement a clustering algorithm in C++, offering both a sequential version and a parallel version that runs on a processor using the OpenMP library.

CPU of the machine used for testing:

11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (Mobile)
Thread per core: 2
Core per socket: 8

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

In the main project folder, create the following system directories:

input/
output/clusters/
output/centroids/

Then execute setup_dataset.py, which will generate datasets in the 'input' directory. This script creates datasets for N=100, 1,000, 10,000, 100,000, 1,000,000, and 10,000,000. For each dataset, it also prepares versions for cluster sizes K=3, 5, 10, 15, 20, 25, 30, 40, and 50.

Second Step (Run the algorithm):

Then compile and run main-parallel.cpp to run the parallel version and main-sequential.cpp for the sequential version of the algorithm. Please note that the sequential version requires a significant amount of time to execute all tests, depending on your machine's hardware capabilities. For my tests, it took approximately 50-60 hours.

At the end of the execution, all timing data will be saved in the 'Times' directory.

Third Step (Speed-Up plot):

Execute the speed_up.py script to generate the Speed Up plots. These plots will be saved in the 'SpeedUp' folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel Programming, K-Means

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

Second Step (Run the algorithm):

Third Step (Speed-Up plot):

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Parallel Programming, K-Means

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

Second Step (Run the algorithm):

Third Step (Speed-Up plot):