This repository provides a comprehensive implementation of the PageRank algorithm using the Power Method. It is organized into four deliverables that cover problem definition, mathematical formulation, code implementation, and experimental validation.
pagerank-power-method/
├── 01. Deliverable 1 - Problem Definition & Background/
│ └── 01. Deliverable 1 - Problem Definition & Background.pdf
├── 02. Deliverable 2 - Mathematical Formulation & Algorithm/
│ └── 02. Deliverable 2 - Mathematical Formulation & Algorithm.pdf
├── 03. Deliverable 3 - Code Implementation & Documentation/
│ └── pagerank-power-method/ # Code modules
├── 04. Deliverable 4 - Experiments & Validation/
│ ├── run_toy_experiments.py
│ ├── run_real_experiments.py
│ ├── run_sensitivity.py
│ └── plots/ # Generated figures
├── LICENSE # MIT License
├── requirements.txt # Python dependencies
└── venv/ # Python virtual environment (excluded from version control)
-
Deliverable 1 – Problem Definition & Background
- Describes the PageRank algorithm from a discrete mathematics perspective.
- Includes problem statement, motivation, literature review, and theoretical foundations (graph theory and Markov chains).
-
Deliverable 2 – Mathematical Formulation & Algorithm
- Formalizes PageRank as an eigenvector problem of the Google matrix.
- Defines the link matrix, teleportation adjustment, and damping factor.
- Presents the Power Method pseudocode, convergence criteria, and complexity analysis.
-
Deliverable 3 – Code Implementation & Documentation
-
Implements a modular Python codebase for PageRank computation.
-
Core modules:
graph_loader.py– Load graphs from edge-list or adjacency-list formats.matrix_builder.py– Build the sparse transition matrix and teleportation vector.power_method.py– Compute PageRank via iterative power iterations.utils.py– Helper routines (vector normalization, residual computation, plotting).run_pagerank.py– Command-line script to run PageRank on a given graph file.
-
-
Deliverable 4 – Experiments & Validation
- Evaluates convergence on toy graphs and a real-world network (Zachary’s Karate Club).
- Performs sensitivity analysis with respect to the damping factor (α).
- Generates tables and plots illustrating residuals, iteration counts, and runtime.
-
Clone the repository
git clone https://github.com/amr-yasser226/pagerank-power-method.git cd pagerank-power-method -
Create and activate a Python virtual environment
python3 -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt
Compute PageRank on an edge-list or adjacency-list file:
python run_pagerank.py <input_path> [--format edgelist|adjlist] [--alpha ALPHA] [--tol TOL] [--max_iter MAX_ITER]<input_path>: Path to the graph file.--format(-f): File format (edgelistoradjlist, default:edgelist).--alpha(-a): Damping factor (default: 0.85).--tol(-t): Convergence tolerance on L1 residual (default: 1e-6).--max_iter(-m): Maximum number of iterations (default: 100).
# Compute PageRank on an edge-list
python run_pagerank.py data/web_graph.txt --alpha 0.85 --tol 1e-8Deliverable 4 includes three scripts under 04. Deliverable 4 - Experiments & Validation/:
run_toy_experiments.py– Convergence on predefined toy graphs.run_real_experiments.py– Validation on Zachary’s Karate Club network.run_sensitivity.py– Sensitivity analysis for different α values.
Run each script directly:
python "04. Deliverable 4 - Experiments & Validation/run_toy_experiments.py"
python "04. Deliverable 4 - Experiments & Validation/run_real_experiments.py"
python "04. Deliverable 4 - Experiments & Validation/run_sensitivity.py"Plots will be saved under 04. Deliverable 4 - Experiments & Validation/plots/.
-
Functions:
load_edge_list(path: str, directed: bool = True) -> List[Tuple[int, int]]load_adjacency_list(path: str) -> Dict[int, List[int]]graph_to_edge_list(graph: nx.Graph) -> List[Tuple[int, int]]
-
Behavior: Reads graph definitions, supports comments/blank lines, and validates node IDs.
- Function:
build_matrix(edges: List[Tuple[int, int]], alpha: float = 0.85) -> Tuple[csr_matrix, np.ndarray] - Behavior: Constructs a column-stochastic sparse transition matrix (handling dangling nodes) and a uniform teleportation vector.
- Function:
compute_pagerank(P: csr_matrix, v: np.ndarray, alpha: float = 0.85, tol: float = 1e-6, max_iter: int = 100) -> Tuple[np.ndarray, List[float], int, float] - Behavior: Iteratively applies the Google operator, normalizes the rank vector, and tracks L1 residuals until convergence or max iterations.
-
Functions:
normalize_vector(x: np.ndarray) -> np.ndarraycompute_residual(prev: np.ndarray, curr: np.ndarray) -> floatplot_residuals(history: List[float], title: str, path: Optional[str] = None) -> None
-
Behavior: Provides vector normalization, residual computation, and residual-plotting utilities.
This project is licensed under the MIT License. See the LICENSE file for details.
For questions or contributions, please open an issue or pull request on GitHub.