Parallel Programming, K-Means

This project aims to implement a clustering algorithm in C++, offering both a sequential version and a parallel version that runs on a processor using the OpenMP library.

CPU of the machine used for testing:

11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz (Mobile)
Thread per core: 2
Core per socket: 8

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

In the main project folder, create the following system directories:

input/
output/clusters/
output/centroids/

Then execute setup_dataset.py, which will generate datasets in the 'input' directory. This script creates datasets for N=100, 1,000, 10,000, 100,000, 1,000,000, and 10,000,000. For each dataset, it also prepares versions for cluster sizes K=3, 5, 10, 15, 20, 25, 30, 40, and 50.

Second Step (Run the algorithm):

Then compile and run main-parallel.cpp to run the parallel version and main-sequential.cpp for the sequential version of the algorithm. Please note that the sequential version requires a significant amount of time to execute all tests, depending on your machine's hardware capabilities. For my tests, it took approximately 50-60 hours.

At the end of the execution, all timing data will be saved in the 'Times' directory.

Third Step (Speed-Up plot):

Execute the speed_up.py script to generate the Speed Up plots. These plots will be saved in the 'SpeedUp' folder.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.idea		.idea
SpeedUp		SpeedUp
Times		Times
cmake-build-debug		cmake-build-debug
plots		plots
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
Presentazione.odp		Presentazione.odp
Presentazione.pdf		Presentazione.pdf
README.md		README.md
Relazione.pdf		Relazione.pdf
commonFunction.cpp		commonFunction.cpp
main-parallel.cpp		main-parallel.cpp
main-sequential.cpp		main-sequential.cpp
plot_graph.py		plot_graph.py
setup_dataset.py		setup_dataset.py
speed_up.py		speed_up.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Programming, K-Means

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

Second Step (Run the algorithm):

Third Step (Speed-Up plot):

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Parallel Programming, K-Means

Speed-up results for the clustering algorithm:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

Speed-up results for the kmeans initialization:

Speed-up plots for all cluster sizes tested:

Speed-up plots for all datasets used:

How to use

First Step (Folder setup and generate datasets):

Second Step (Run the algorithm):

Third Step (Speed-Up plot):

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages