Skip to content

MirkoBicchierai/Parallel_Programming_K-Means

Repository files navigation

Parallel Programming, K-Means

This project aims to implement a clustering algorithm in C++, offering both a sequential version and a parallel version that runs on a processor using the OpenMP library.

CPU of the machine used for testing:

  • 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (Mobile)
  • Thread per core: 2
  • Core per socket: 8

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

In the main project folder, create the following system directories:

input/
output/clusters/
output/centroids/

Then execute setup_dataset.py, which will generate datasets in the 'input' directory. This script creates datasets for N=100, 1,000, 10,000, 100,000, 1,000,000, and 10,000,000. For each dataset, it also prepares versions for cluster sizes K=3, 5, 10, 15, 20, 25, 30, 40, and 50.

Second Step (Run the algorithm):

Then compile and run main-parallel.cpp to run the parallel version and main-sequential.cpp for the sequential version of the algorithm. Please note that the sequential version requires a significant amount of time to execute all tests, depending on your machine's hardware capabilities. For my tests, it took approximately 50-60 hours.

At the end of the execution, all timing data will be saved in the 'Times' directory.

Third Step (Speed-Up plot):

Execute the speed_up.py script to generate the Speed Up plots. These plots will be saved in the 'SpeedUp' folder.

About

University project for the "Parallel Programming" exam. Parallelization on CPU using "OpenMP", in this case K-Means algorithm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors