This project aims to implement a multi-threaded version of the PageRank algorithm. The algorithm is used to rank web pages in search engines, and it is based on the idea that a page is important if it is linked by other important pages. The algorithm is iterative and computes the PageRank values of the nodes in a graph. See the report for firther details.
The project is divided into the following files:
-
main.cpp: The main file that reads the input graph and calls the PageRank function. It takes arguments for the path to a list of directed graph edges and the maximum number of threads to use. It then measures the execution time from 1 to the maximum number of threads and saves the results in a
.csvfile inside thestatsfolder. -
main_perf.cpp: Another main file specifically created for analysis using the Perf tool. It takes arguments for the path to a list of directed graph edges, the algorithm to use (a or b), and whether it's sequential or parallel (s or p). It simply calls the required PageRank function.
-
main_scorep.cpp: Another main file specifically created for analysis using the Score-P tool. It takes arguments for the path to a list of directed graph edges and performs parallel PageRank using all available threads.
-
utility.h and .cpp: A file containing utility functions such as the function to read the input graph and the function to calculate statistics.
-
graph/graph.h and .cpp: A file containing the graph class that stores a column-wise graph and the functions to calculate PageRank.
-
graph/graph_by_row.h and .cpp: A file containing the graph class that stores a row-wise graph and the functions to calculate PageRank.
-
speedup_graphs.py: A Python script that reads the
.csvfiles inside thestatsfolder and generates speedup graphs. -
graph_generator.py: A Python script that generates a list of random directed graph edges with a given number of nodes and edges.
To compile the project, run the following commands:
g++ -std=c++20 main.cpp utility.cpp graph/graph.cpp graph/graph_by_row.cpp -o main -fopenmp -O3To run the project, use the following command:
./main <path_to_graph_edges> <max_threads>
Example (Run main on p2p_Gnutella31 up to 26 threads):
./main ./graphs/p2p_Gnutella31.txt 26To compile and run the project for analysis using the Perf tool, run the following commands:
g++ -std=c++20 main_perf.cpp utility.cpp graph/graph.cpp graph/graph_by_row.cpp -o main_perf -fopenmp -O3To run the project for analysis using the Perf tool, use the following command:
perf record -g ./main_perf <path_to_graph_edges> <algorithm> <mode>
Example (Run Algorithm 1 Parallel on p2p_Gnutella31):
perf stat -d ./main_perf ./graphs/p2p_Gnutella31.txt a pTo compile and run the project for analysis using the Score-P tool (must be installed before), run the following commands:
mkdir scorep
cd scorep
scorep-g++ -std=c++20 ../main_scorep.cpp ../utility.cpp ../graph/graph.cpp ../graph/graph_by_row.cpp -o main_scorep -fopenmpTo run the project for analysis using the Score-P tool, use the following command:
mpirun -np 1 ./main_scorep <path_to_graph_edges>
This will produce a folder named scorep-<timestamp> containing the analysis results ready to be visualized using the Vampir tool.
Example (Run main_scorep on p2p_Gnutella31):
mpirun -np 1 ./main_scorep ../graphs/p2p_Gnutella31.txt