pagerank-dynamic/02-related-work.tex at main · merferry/pagerank-dynamic · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
\ignore{Some of the early work on dynamic graph algorithms in the sequential setting include the seminal sparsification method of Eppstein et al. \cite{graph-eppstein97} and the bounded incremental computation idea of Ramalingam \cite{incr-ramalingam96}. The latter advocates measuring the work done as part of the update in proportion to the effect the update has on the computation.}

A number of approaches have been proposed for performing incremental computation (updating PageRank values in a dynamic / evolving graph) of approximate PageRank. Chien et al. \cite{rank-chien01} identify a tiny region of the graph near the updated vertices and model the remainder of the graph as a single vertex in a new, much smaller graph. PageRanks are computed for the small graph and then transferred to the original graph. Chen et al. \cite{chen2004local} propose a number of methods to estimate the PageRank score of a particular web page using only a small subgraph of the entire web, by expanding backwards from the target node following reverse hyperlinks. Bahmani et al. \cite{bahmani2010fast} analyze the efficiency of Monte Carlo methods for incremental computation of PageRank. Zhan et al. \cite{zhan2019fast} propose a Monte Carlo based algorithm for PageRank tracking on dynamic networks, by maintaining $R$ random walks starting from each node. Pashikanti et al. \cite{rank-pashikanti22} also follow a similar approach for updating PageRank scores on vertex and edge insertion/deletion.

A few approaches have been proposed for updating exact PageRank scores on dynamic graphs. Zhang \cite{rank-zhang17} presents a simple incremental Pagerank computation system for dynamic graphs on hybrid CPU and GPU platforms that incorporates the Update-Gather-Apply-Scatter (UGAS) computation model. A common approach used for Dynamic PageRank algorithm, given a small change to the input graph, is to find the affected region in the preprocessing step with Breadth-First Search (BFS) or Depth-First Search (DFS) traversal from the vertices connecting the edges that were inserted or deleted, and computing PageRanks only for that region \cite{rank-desikan05, kim2015incremental, rank-giri20, sahu2022dynamic}. This approach was originally proposed by Desikan et al. \cite{rank-desikan05}. Kim and Choi \cite{kim2015incremental} use this approach with an asynchronous implementation of PageRank. Giri et al. \cite{rank-giri20} use this approach with collaborative executions on muti-core CPUs and massively parallel GPUs. Sahu et al. \cite{sahu2022dynamic} use this approach on a Strongly Connected Component (SCC) based decomposition of the graph to limit the computation to SCCs that are reachable from updated vertices, on multi-core CPUs and GPUs (separately). Ohsaka et al. \cite{ohsaka2015efficient} propose an approach for locally updating PageRank using the Gauss-Southwell method, where the vertex with the greatest residual is updated first --- however, their algorithm is inherently sequential.
%% Ohsaka et al. use L1-norm like many others for error.

% \ignore{PageRank algorithm is a \textbf{live algorithm} which means that an ongoing computation can be paused during graph update, and simply be resumed afterwards (instead of restarting it).} Dynamic PageRank algorithms aim to handle changes in the input graph efficiently.\ignore{A number of techniques have been proposed for online analysis of link evolution, i.e., updating of PageRank values in a dynamic/evolving graph.}

Further, Bahmani et al. \cite{rank-bahmani12} propose an algorithm to selectively crawl a small portion of the web to provide an estimate of true PageRank of the graph at that moment, while Berberich et al. \cite{rank-berberich07} present a method to compute normalized PageRank scores that are robust to non-local changes in the graph. Their approaches are orthogonal to our \textit{Dynamic Frontier} approach which focuses on the computation of the PageRank vector itself, not on the process of crawling the web or maintaining normalized scores.
%% Add other interesting variations?

%% We use an asynchronous approach:
% Real-Time PageRank on Dynamic Graphs (2023): In this paper, Sallinen et al. \cite{sallinen2023real} compute PageRank asynchronously for real-time, on demand PageRank computation with arbitrary granularity. They model PageRank as a flow problem, where mass is absorbed by the page, and the rest is distributed to neighbors. This is done by sending delta values of probability mass depending on edge deletion or insertions by adjustment upon earlier values. Sink/dangling vertices (dead ends) are handled as usual (teleport).

%% Interesting approach:
% PageRank Algorithm Based on Dynamic Damping Factor (2023): Existing methods often set the damping factor empirically, overlooking the relevance of web visitors’ topics. HaoLin et al. \cite{haolin2023pagerank} propose an adaptive dynamic damping factor based on the web browsing context, and demonstrate that it effectively mitigates the impact of noisy web pages on query results and improves the convergence speed.

%% Sliding window approach.
% Time-Aware Ranking in Dynamic Citation Networks (2011): In this paper, Ghosh et al. \cite{ghosh2011time} consider the temporal order of links and chains of links connecting to a node with some temporal decay factors, and show that it is more appropriate for predicting future citations and PageRank scores with regard to new citations.

%% Other interesting approach:
% A Dynamical System for PageRank with Time-Dependent Teleportation (2014): In this paper, Gleich and Rossi \cite{gleich2014dynamical} propose a time-dependent teleportation to the PageRank score. The magnitude of the deviation from a static PageRank vector is given by a PageRank problem with complex-valued teleportation parameters. They demonstrate the utility of dynamic teleportation on both the article graph of Wikipedia, where the external interest information is given by the number of hourly visitors to each page, and the Twitter social network, where external interest is the number of tweets per month. They show that using information from the dynamical system helps improve a prediction task and identify trends in the data.

%% Other interesting approach:
% Temporal PageRank (2016): In this paper, Rozenshtein and Gionis \cite{rozenshtein2016temporal} propose an extension of static PageRank to temporal PageRank so that only temporal walks are considered instead of all possible walks. In order to compute temporal PageRank we need to process the sequence of interactions and calculate the weighted number of temporal walks. Their algorithm counts explicitly the weighted number of temporal walks.

%% Similar to STIC-D:
% Divide and conquer approach for efficient pagerank computation (2006): In this paper, Desikan et al. \cite{desikan2006divide} propose a graph-partitioning technique for PageRank, on which computation can be performed independently.

%% Similar to STIC-D:
% A componentwise PageRank algorithm (2015): In this paper, Engstrom et al. \cite{engstrom2015componentwise} propose two PageRank algorithms, one similar to the Lumping algorithm proposed by Qing et al. which handles certain types of vertices faster, and last, another PageRank algorithm which can handle more types of vertices as well as strongly connected components more effectively. This is similar to the work of Garg et al.

%% Streaming PageRank:
% Estimating PageRank on graph streams (2011): In this paper, Sarma et al. \cite{rank-sarma11} study the streaming model for PageRank, which uses a small amount of memory (preferably sub-linear in the number of nodes n). They compute approximate PageRank values in Õ(nM−1/4) space and Õ(M3/4) passes. They also give another approach to approximate the PageRank values in just Õ(1) passes although this requires Õ(nM) space.

%% Applications of PageRank:
% PageRank Tracker: From Ranking to Tracking (2013): PageRank has been used by Gong et al. \cite{gong2013pagerank} in video object tracking to improve its robustness, i.e., to address difficulties with adaptation to environmental or target change. Determining the target is equivalent to finding the unlabeled sample that is the most associated with the existing labeled set.

%% Applications of PageRank:
% Abstracting PageRank To Dynamic Asset Valuation (2006): In this paper, Sawilla \cite{sawilla2006abstracting} uses (weighted) PageRank to quickly and dynamically calculate a relative value for all assets in an organization in any context in which dependencies may be specified. Their scheme works in general and will provide asset valuation in any context, be it confidentiality, integrity, availability, or even political capital.

%% For introduction, also a bit here:
% Adaptive Implementation to Update Page Rank on Dynamic Networks (2021): In this oral presentation, Srinivasan \cite{srinivasan2021adaptive} talk about the fact that There are a lot of attempts made to parallelize the page rank algorithm for static networks, however, there are only very few attempts made to compute page rank on dynamic networks. As the networks change with time, computing page rank or updating is an expensive operation, the previous attempts have only approximated the metric to avoid recomputation. In this paper, we introduce a framework where we try to update the page rank of the vertices which embraces change as the network changes. The proposed framework is implemented on a shared memory system and experiments on real-world and synthetic networks show good scalability. The framework proposed gets an input set of networks, initial page rank values for all the vertices, and a set of batches that has the changeset. As the batches are processed in parallel, affected vertices are identified and marked for an update, once the batch is processed the vertices affected or identified their page rank values are computed. The main contribution of this paper is the proposed framework avoids recomputation of all vertices, and only recomputes few vertices, and avoids approximation to provide accurate values.