Skip to content

Latest commit

 

History

History
113 lines (88 loc) · 3.92 KB

File metadata and controls

113 lines (88 loc) · 3.92 KB

Parallel Programming, K-Means

This project aims to implement a clustering algorithm in C++, offering both a sequential version and a parallel version that runs on a processor using the OpenMP library.

CPU of the machine used for testing:

  • 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (Mobile)
  • Thread per core: 2
  • Core per socket: 8

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

In the main project folder, create the following system directories:

input/
output/clusters/
output/centroids/

Then execute setup_dataset.py, which will generate datasets in the 'input' directory. This script creates datasets for N=100, 1,000, 10,000, 100,000, 1,000,000, and 10,000,000. For each dataset, it also prepares versions for cluster sizes K=3, 5, 10, 15, 20, 25, 30, 40, and 50.

Second Step (Run the algorithm):

Then compile and run main-parallel.cpp to run the parallel version and main-sequential.cpp for the sequential version of the algorithm. Please note that the sequential version requires a significant amount of time to execute all tests, depending on your machine's hardware capabilities. For my tests, it took approximately 50-60 hours.

At the end of the execution, all timing data will be saved in the 'Times' directory.

Third Step (Speed-Up plot):

Execute the speed_up.py script to generate the Speed Up plots. These plots will be saved in the 'SpeedUp' folder.